

BobKatz 





conTenTs 


Foreword .9 

Introduction .11 

Part I: Preparation 

Chapter i No Mastering Engineer is an Island.17 

Chapter 3 Connecting It Together.35 

Chapter 3 An Earientation Session.41 

Chapter 4 Wordlengths and Dither.49 

Chapter 5 Decibcls for Dummies.61 

Chapter 6 Monitoring.75 

Part II: Mastering 

Chapter 7 Putting the Album Together.87 

Chapter 8 Equalization Teclmiques.99 

Chapter 9 How to Manipúlate Dynamic Range for Fun & Proflt 

Part One: Macrodynamics .109 

Chapter 10 Part Two: Do-wnward Processes .117 

Chapter 11 Part Three-. The Lost Processes .i 33 

Chapter 13 Noise Reduction.139 

Chapter i 3 Other Processing.145 

Part III: Advanced Theory and Practice 

Chapter 14 How to Make Better Recordings in the aist Century 

Part One.- Monitor Calibration .165 

Color Platea.177 

Chapter 15 How to Make Better Recordings in the aist Centuir 

Part Two: The K-System, an Integrated Approach to Metering, Monitoringand LevelingPractices ... 185 

Chapter 16 Analogand Digital Processing.197 

Chapter 17 How to Achieve Deptli and Dimensión in Recording, Mixingand Mastering.311 

Chapter 18 High Sample Rates. Is This Where It's At?.331 

Chapter 19 Jitter — Separatingthe Myths from the Mysteries.337 

Chapter 30 Tipsand Tricks.345 


7 
























corrrenTS 


CODTinUCD 


Part IV: Out of the Jungle 

Chapter 21 Education, Education, Education.263 

Chapter 22 At 1 así.267 

Part V: Appendices 

Appendix 1 Radio Ready—The Truth.271 

Appendix 2 The Tower of Babel — Audio File Formáis.279 

Appendix 3 Preparing Tapes and Files.283 

Appendix 4 Tape Label/Log.289 

Appendix 5 Tableof Decibels.292 

Appendix 6 Q vs. Bandwidth.2g3 

Appendix 7 I Feel the Need for Speed.294 

Appendix 8 I Feel the Need for Capacity.295 

Appendix 9 Footnotes on the K-System.296 

Appendix 10 Recommended Reading, Test CDs.299 

Appendix 11 Biography: Eric James. 3 oi 

Appendix 12 Biography: Robert A. Katz. 3 o 2 

Appendix i 3 Glossaiy.307 

Index . 3 n 

Afterword .319 


8 



















ForeworD 


Bv Roger Nichols 

When a recording artist I produced heard a great song on the radiu he would turn lo me and say, 
"I was going to write that song! ” After reading this book my reaction was, ”1 was going to write this 
book!” Well, I am glad Bob beat me to it because it looks like he did a much better job than I could 
have. 

What places this book head and shoulders above the rest is the attention to useful detail. 

Instead of somc hypcrbolc, the reader can actually put these methods to good use. The descriptions 
of how to perform a task are augmented with the reason that you should perform the task. Not just 
how downward compressors work, but when and why you would want to use them. Science is 
meaningless without art. 

How do I tcll if the digital signal is 16 bit or 24 bit? What does noise shaping do? Should I mix at 
96 kHz? How do you make something 3 dB louder when it is already lighting up the over lights? 
Should I mix to analog or digital? How do I set up my speakers for mixing surround? Which weighs 
more, a pound of gold or a pound of feathers? These are some of the questions that Bob answers in 
a clear and concise style. 

Bob enters each mastering session with his eyes wide open. Each project is unique, and each 
mastering session will require a unique approach to bring out the very best results. Bob’s musical 
background helps him select the proper tools for the job. Knowingthat a string quartet record does 
not require the same approach as the Back Street Boys record is half the battle. 

Eveiy day clients ask for louder and louder CDs when they come to a mastering session. It is 
veiyhard to find Hi-Fidelity CDs these days. Now that you can doyour own recording to a digital 
workstation, buyyour own multi-band compressors and burnyour own CDs, who needs 
mastering? My answer is that if you record your own projeets at home, you need mastering more 
than the producer who works with the top engineers inthe top studios. The key is outside 
reference. No, I don’t mean that your neighbor carne over and said, "Hey, that sounds really great!” 
I mean reference to other projeets, and reference to other engineers who have worked on great 
soundingCDs. 


9 


Bob docs an excellent job of dispelling the myth that the louder jou make your CD, the louder it 
will be on the radio. Read this part more than once. Once the reality sinks in, then maybe we will 
have more viable candidates for a Best Engineered CD Grammy, instead of having to choose a CD for 
the Least Offensive Engineering award. 

The professional mastering engineer works on material from all corners of the music business. 
This is the last stop before the CD hits the radio and the record stores. The smartest thing any 
mixing engineer can do is leave the final loudness tools to the loudness professional. 

Limiters and compressors should be treated just like firearms. There should be guides for the 
proper use and classes you must take before you can own one. That class is here in this book. After 
you read this "audio firearms manual” you will have a much better understanding of the mastering 
process. You will knowwhen and howto use these tools yourself and w'hen to leave it to the 
professional. Treat every compressor/limiter as a loaded weapon, and don’t point it at anyone 
unless you intend to use it. It’s the LAW! 

I get e-mail quite often from independent artists who are recordingtheir music at home and 
want to know what gear to buy to help them mix before they send it to me for mastering. I tell them 
that the first piece of equipment they should buy is Bob Katz’s Mastering Audio, The Art and the 
Science. 


Roger Nicho ls 
Miami, August 2,002, 


10 


imroDUCTion 


What Is Mastering? 

Mastering is the last Creative step in the 
audio production process, the bridge between 
mixing and replication—your last chance to 
enhance sound or repair problems in an 
acoustieally-designed room—an audio 
microscope. Mastering engineers lend an 
objective, experienced ear to your work¡ we are 
familiar with what can go wrong technically and 
esthetically. Sometimes all we may do is— 
nothing! The simple act of approval means the 
mix is ready for pressing. Other times we may 
helpyou work on that problem songyou just 
couldn’t get right in the mix, or add the final 
touch that makes a record sound finished and 
playable on a wide variety of systems. 

The Approach of this Book 

The mastering studio is the place where 
experience in the musical art is combined with 
the Science of audio, but the dividing line 
between art and Science is nebulous. and so my 
book constantly tries to integrates the art and 
the Science. 

Technology changes so fast in today’s world; 
forexample, no one predicted that a rapid 
proliferation of digital cameras would threaten 
the once-mighty Polaroid Corp. Fiveyears after 
this book is published, one-third of its technical 


information will be outdated. Tenyears from 
today, one-third or more of its technical 
information will be obsolete. But old-fashioned 
craftsmanship and attention to detail will always 
be in demand. I hope that even fifty or one 
hundred years from today, mastering engineers 
will still be considered crafts persons. I hope 
that the artistic and procedural information 
provided herein will always be precious to 
students of the art of audio mastering. 

Attention Gearheads 

This book is designed to help you learn to 
make informed decisions onyour own; how 
audio equipment works, and what happens 
whenyou turn the knobs. Just about every day I 
get a letter like this one from engineers asking 
me to approve or bless their particular list of 
equipment: 

Dear Bob, I always masterwith a Sis- 
boom-bah brand compressor and 
equalizer, then follow ¡t off with a 
touch of a Franifras enhancer. On the 
next pass I use a Caramba tool to 
maximize the sound and then 
Whosizats dither before going to CD. 

Please tell me what you th¡nk of my 
choices? Sincerely, Gearhead. 


On Language 

Sex is good! And being 
sexy can be fun! I feel 
that language should be 
sexy, too, and our 
centuries-old male- 
centric language must 
be rather wearying to 
tie women in our 
society. It’stimetoput 
some vitality back into 
our syntax. Thus, you 
will find that in one 
chapter of this book, 
the Mastering Fngineer 
may be a femóle, and in 
another, male! Vive la 
différence! 


u 




I usually reply, politely, 

Dear Gearhead, your equipment list 
sounds pretty extensive, but much 
more important is howyou use ¡t. For 
example, some of the gear you 
describe would be entirely inappro- 
priate for some kinds of music.... 

If there is one essential piece of 
information you can get from this book it is this 
aphorism written by master engineer Glenn 
Meadows. 


"There is no magic silver bullet. There is no one 
magic anything that will be best'in all situations. 
The abilit/of the operator to determine what it ís 
that needs to be done and pick the best combination 
oftools is more important than what tools are used." 

— Gienn Meadows 


Glenn’s 
statement also 
applies to the 
amount or 
setting of each 
knob or control 
withinyour equipment. There is no magic 
threshold. or EQ setting, or ratio, or preset that 
will turn ordinary sound into magic. Sonic 
magic comes from the hard work you put into 
using your tools (musical magic can only come 
from the music itself). The truth is that in a 
typical mastering session, each tool makes only 
an incremental improvement, and the final 
result comes from the synergistic totality of the 
tools working together. In these days of mass- 
gear-marketing by competitive manufacturers 
there is too much emphasis on the glitz, fashion 
and style of the gear rather than on its sound 


quality and principies of operation. While this 
book is definitely for gearheads (in the sense 
that it has lots of glitzy pictures and description 
of gear designed to produce good sound), 
serious engineers who want to improve their 
techniques will also find out how their devices 
function. Audio principies never go out of style, 
but models of gear will always fade away. 

The theories and background covered here 
are what I consider to be the minimum nec- 
essary to become a competent audio engineer in 
this digital age. I do not inelude any heavy 
mathematical formulas in the main text (you 
will find more thorough explanations in the 
footnotes). There are plenty of good founda- 
tional basics for beginners, and the most 
experienced digital design engineer will find 
useful detail. I inelude practical examples at 
eveiy stage; but if the going gets difficult at any 
point, simply move to the next section. As you 
grow in experience, whenyou revisit those 
sections you may have skipped everything will 
seem less abstract. I try to define any special 
terms the first time you meet them; terms can 
also be found in the glossaiy (Appendix 15) and 
in the Índex. Just like a well-sequenced record 
álbum, the chapters in this book were designed 
to be read sequentially. 


12 


A Taste of This Book: Chapter by Chapter 

Parí I of the book is called Preparation. 

The mastering engineer has tremendous power, 
and with that power comes great rcsponsibilíty. 
Although it is possible to turn an ordinaiy mix 
into a glorious-soundingproduction, sadly it is 
also possible to ruin a piece of delicate music by 
applying the wrong approach. 

Chapter i: No Mastering Engineer is an 
Island. outlines the steps taken in producing a 
record álbum, our mastering philosophy, 
workflow and procedures. 

Chapter 2: Connecting It All Together, 

presents the block diagram of a mastering 
studio and a general equipment description. 

Chapter 3 : An Earientation Session, shows 
howwe develop listening skills. 

Chapter 4: Word lengths and Dither, is a 

simplified explanation of one of digital audio’s 
technical mysteries. 

Chapter 5: Decibels for Dummies, 

describes how level meters work, the myths of 
normalization, and how to effectively interface 
analogand digital equipment. 

Chapter 6: Moniloriiig, demónstrales the 
need for accurate monitoring and proper room 
acoustics. 


Parí II is called Mastering Tech ñiques, the 
importanttechniques and processes we use in a 
mastering session. 

Chapter y shows that PuttingThe Album 
Together is a critical art and science. 

Chapter 8: Equalization, difiere ntiates 
EQ practice for mastering from that used in 
tracking/ mixing. 

Next comes our dynanwcs t rilogy: HowTo 
Manipúlate Dynamic Range For Fun And 
Profit, inthree parts, Chapters 9-11, covering 
dynamics processing, theory and philosophy 
from A-Z. 

Chapter 12: Noise Reduction, ineludes 
both manual and automatic noise reduction 
techniques. 

Chapter i 3 : Other Processing, ineludes 
such tricks of the trade as M/S processing, 
elassie and not-so-classic specialized analog 
and digital processors. 

Part III: Advanced Theory and Praetice, 
beginswitha two-part series-. HowTo Make 
Better Recordings in the asi 8 * Century, 
Chapters 14-15. 

Chapter 14: Monitor Level Calibration, 

shows how to set up and calibrate a stereo or 5.1 
monitor system, and how to use the simple tool 
of the monitor knob 'sposition to help judge 



MyTH: 

Digital Audio 
requires less 
technical skill to 
use than analog. 

I_I 




program loudness and quality. 

Chapter 15: The K-System, is my proposal 
for a 2i st centuiy approach to metering and 
monitoringto help us produce more consistent 
and better-sounding recordings. 

Chapter 16: AnalogAnd Digital Signal 
Processing, describes some of the analytical 
tools we use to look at sound and investigates the 
non-linear relationship between equipment 
measurements and auditoiy perception. 

Chapter 17: How To Achieve Depth and 
Dimensión in Recording, Mbdng and 
Mastering, studies the powerful classic 
techniques for obtaining space and depth in 2- 
channel stereo so as to make an effective move 
on to surround. 

Chapter 18: High Sample Rates, Is This 
Where It's At? tells us why it’s still important to 
use a high-bandwidth system even though our 
ears are only good to 20 kHz (on a good day!). 

Chapter 19: Jitter: Separatingthe Myths 
From the Mysteries, is a direct and definitive 
layman’s explanation of the topic. 

Chapter 20: Tips andTricks, digs into the 
practical aspects of making AES/EBU and 
S/PDIF work foryou and provides other little- 
known tips to ease your audio Ufe. 


14 


Part IV: Out ofthe Jungle, presents some 
of my personal conclusions. 

In Chapter 21: Education, Education, 
Education, we get to preach what wepractice! 

Chapter 22: At Last, is a contemplative 
poem, my hopes and dreams of our musical and 
audio future. 

Part V: Appendiees contains some veiy 
useful information, including: 

• How to prepare tapes and files for mastering 

• Radio Ready, The Truth, largely written by guest 
authors Robert Orban and Frank Foti, with 
contribution by Tardón Feathered, shows how 
radio processing severely affects our mixes and 
debunks for all time the myth that super-hot 
recordings sound better over the radio 

• I Feel The Need For Speed, a comparison of 
transfer speeds 

• Reconunended Reading 

• Glossary 

• Audio File Formats 

Plus, visit the digido.com website for an 
online companion to this book: 

• An Honor Roll of great-sounding Pop CDs, newly 
compiled for this book 

• IJRLs and websites with mastering resources 


Nowthatyou’ve had a taste, let’s begin 
Mastering Audio... 


PART I : PRF.PARATION 

ff 

GeTTinG 

RCclDYis 

HELF 

THe JOB. 

11 


— Anón 




CHaPTer i 


I. In The Beginning 


No Mastering 
Engineer Is 
An Island 


This chapter is about the philosophy of 
mastering and the mastering engineer’s approach to 
audio. We begin by reviewingthe place of mastering 
in the overall scheme of producing and manufac- 
turinga record. 

The Record Album: 

from Conception to Finished Product 

In the beginning was the word (and the music). 
And that shall never change. But consumer formats 
do change, and I’m goingto miss the Compact Disc 
when it becomes obsolete—it is probably the first 
and last professional audio médium that can be 
created, nurtured and mastered by a single 
individual. The CD is much easier to produce than 
the LP, because Computer tcchnology has removed 
forever the words rewind and razorblade from our 
working vocabulaiy. But now even the simplest 
DVD-A demands a team effort, specialists in audio, 
video (menus or stills), and interactivity. And 
quality control for multiehannel requires great time 
and attention to detail. 


Compact Üisc Project: From Conception to Manufacturing 



To 

Manufacturing 

Plant 


Multitrack 2to6Track 

Master Stems 


Premaster: Last Artisüc/Esthetic 

Tape or Disc Step 

or File ídt Server 



'7 


































The preceding figure outlines the major artistic 
and technical steps in Compact Disc or SACD 
production, fromthe conceptual beginning, through 
to the finished technical product. 

The song composition and overall conception oí 
the álbum takes shape in a gestation period that can 
last for years, with contributions from the artist, the 
producer, the record company A&R or all three. 
Then arrangements are written, musicians are 
hired, and the artists go into the recording studio or 
on location for the recording to multitrack. This 
may seem terribly antiquated to those who can 
record an entire "virtual orchestra” in their project 
studio, but my personal hope is that the rich art of 
musical collaboration, with musicians actually 
playingtogether "live,” never goes away. 

Tracking...in the not-so-distant future 

The accepted médium for the multitrack 
recording is rapidly becoming the Computer hard 
disc as a replacement for tape-based formats. In the 
not-so-distant future universal storage will be so 
large, and Internet communication so speedy, that 
the need for a local physical multitrack "machine” 
may eventually entirely disappear. A single central 
server will provide all our computing and audio 
needs. The artist will be able to fly from Seattle to 
San Francisco without carrying anything, plug into 
the Internet, and continué overdubbing! However, 
before this can happen, Internet bandwidth to the 
home and studio will have to increase by a few 
orders of magnitude. This also means that the mas- 
tering process will involve the mastering engineer 
simply accessing the relevant tracks from the cen¬ 
tral server instead of being sent tapes by FedEx. 


Mixing 

After the tracking is complete the producer, 
artist and mixing engineer produce the mix of each 
song or section of the work. If mixing to stereo, 
the mix goes to two tracks, but even then it may be 
divided into several ?-track stems so that the 
mastering engineer can tweak the interrelationship 
between leads and rhythm if it proves necessary 
after mastering processing, or in the light of the 
reference monitoringat the mastering house. If 
mixing for surround, the mix may go to six or more 
tracks; and if divided in stems, the vocal, rhythm 
and lead stems may take up 18 or more tracks! 

Editing and Premastering 

The next step, editing, may be carried out at 
either the recording studio or at the mastering 
house. It is followed by premastering, which is the 
official ñame of our profession, to distinguish it 
from the technical mastering that takes place at the 
plant (though everyone calis us mastering engineers 
for short). Premastering can inelude the artistic and 
technical tasks of sequencing (puttingthe álbum in 
song order), dynamics processing, leveling, 
equali/.ation. noise reduction, even some mixing, 
described in detail in later chapters. Naturally, the 
output médium of premastering is officially called the 
premaster, but we usually label it master. 

At the Plant 

At the plant, the premaster is used to create the 
glass master — an ephemeral product that actually 
gets destroyed during the production process! At 
many plants, glass mastering is performed in a class 
10 clean room (or better) by engineers wearing 
white "space suits” (affectionately known as monkey 


18 


Chapter i 


suits). But an alternative is that some plants 
house their LBRs (láser beam recorders) in a self- 
contained clean room that can be loaded up in the 
morningby one suited individual and run all day 
without intervention, just observation through a 
Plexiglás windnw. The LBRis a multi-million dollar 
machine that takes the digital information for the 
master, encodes it’ to the proper format and then 
sends an encoded láser beam onto a light sensitive 
emulsión applied to the surface of a 9.5” glass disc. 
The on - off láser pattern generates a series of pits 
and lands after the emulsión is developed. The 
eoated glass disc is then moved to another clean 
room. where the emulsión is sputtered with a fine 
nickel alloy in a process called metallization. Next, 
•.he glass píate is put in a vat where an electrical 
charge is applied. allowingthe surface to be plated, 
:n a process called electroforming. Afler plating. the 
metal píate is peeled off and the glass surface can be 
cleaned and reused for a new master. 

This first metal píate is called the father and is 
Ihe inverse of the final CD (pits are lands and vice 
versa). For small runs, the father can be used 
directly as a stamper. But for any significant 
quantity, the father is electroformed to create a 
nother (which is the inverse of the father) from 
vhieh many stamper» can be produeed. F.aeh 
stamper goes into a press, where a clear polycar- 
bonate disc is inserted and molded. Afterwards, the 
disc is metallized with an aluminum reflective layer 
(gold can be used in specialty pressings) and eoated 
vitha protective lacquer. Finally, a silk-screened or 

• The encoding ineludes EFM modulation and error corrcction Information. The 
tsact nature of compact disc ind DVD encoding is beyond the scope of this book. 
Furthcr references can be found in Appendix 10. 


offset label is applied to the top of the disc, which is 
then packaged with booklets into the CD boxes bv 
automated machinery. Eveiy element musí be 
carefully inspected for defeets—booklets must be 
properly trimmed, cardboard seams must not tear, 
CD surfaces must not he stained, labelingshould 
look clean, and the CD itself must meet the proper 
tests for pit depth and spacing (e.g. jitter and RF 
outputtests). It’s an exactingprocess but.... 

DVDs are even more complex 

Although produeing a DVD or DVD - A is very 
similar to produeing a CD, it requires a much 
greater magnitude of precisión. This is because a 
one-sided DVD contains about 7 times the 
information density of CD, and thus costs more, in 
the Creative, technical and manufacturing stages. 
The Creative department has to generate the 
graphics and menú copy and the plan for interac- 
tivity well in advance of the authoring stage; 
furthermore, all of these elements might be in 
constant flux until the reference audio track has 
been firmly edited and mastered. Finally, at the 
plant, DVDs require much more stringent QC 
standards than CDs, especially because of the 
delicate bonding process for a multi-layer DVD. 

II. Mastering Philosophy 
and Procedures 

For every good mastering engineer, meticu- 
lousness and attention to detail is the norm, not the 
exception. We’ve always been called upon to keep 
careful track of a project from the time it arrives 
until it becomes the final product. Days, weeks, 
or perhaps years later, if revisions are called, the 


19 No Mastering Engineer 
Is An island 


client has a reasonable chance of ascertaining which 
processes were used by Consulting with the 
mastering engineer. At RCA Records through the 
8o’s, analogtape box labels included "dash 
numbers” (e.g. — i, -a, - 3 ). íor each copy generation. 
and a card catalog carefullv logged the tape’s status 

and which one 
was the correct 
masler to use for 
LP or Cassette 
duplication. 
When masters 
were sent for 
disc cutting, the cutting engineer inserted a written 
logindicatingthe Pultec or other equalizer settings 
they used, left/right channel gains, and so 011. 

Today, the situation is lar more complicated 
than simply looking in a tape box for cutting 
information and marking the box with the 
generation number. Audio-oxüy projects may arrive 
in múltiple forms, from DATs to Pro Tools Hard 
dises to CD ROMs to analog tapes. Projects may be 
two channel or multichannel surround; they may 
arrive as full mixdowns, partial mixdowns (stems) 
or combinations. The definition of what is the 
Master becomes even more vague, since 
multimedia projects may be finished at the audio 
mastering studio, or authoring added at some studio 
down the road. Metadata (see Chapter 15) including 
watermarking may be added during a later authoring 
stage, further complicating the situation. 

But one thinghas not changed: it is the respon- 
sibility of the mastering engineer to ensure that the 
audio quality which leaves the mastering studio is 


Attention to detall. The last 10% 
of 'the job takes 90% of the time. 


the same quality that will be represented on the final 
médium. We must be familiar with what may happen 
to the project when it leaves our office, and we must 
familiarize the producer with what is necessaiy to 
preserve the audio quality. I believe in the concept 
of the Mastering Studio as the Mothership, the 
coordinator of audio quality, and perhaps more, if 
we've also taken over the authoring duties. 1 In these 
days of Multimedia, DVDs and SACDs, it is possible 
that the sound we mastered may be further 
manipulated by a video engineer, or by some 
individual who is not skilled in audio production- 
which is truly countcrintuitive.'All the more reason 
for the mastering studio to take on the Mothership 
role. 

Now, let’s examine the steps, tools and 
processes involved in mastering a project. 


a o 


Cliapter i 



Loid-ln 

! Dk:0 »OM0O1 

tóv'24 bit Itudm from 
jrjíb 7.’8 of an ADAT 48 
IH/upc 


I íucr BcjUi Mr. Western 
P»ir| Wat truneaied 
l«we»here) to 16 bits. Lcad 
is a littlc loo loud 
cotararrJ to the riiythm. I 
hflUmike it twingtbit 
pm Adiaban mote 

m 0 

: 2/bxfcy Oíd Sun. "1 noticc 
K«w of Ihc lead vocal s on the 
fcirc "oíd" sound abit held 
fcacL We tned to compénsale 
i i» feo peakiness". BK : 

¡toca s seem loo laid back. Tr> 
Ln giue it a bit more bouncc. 
,"N¡>sc in the tail oui". Bs is 
jj| itere bul a hair washy in 
nm frcqticncies. Violín is a 

E aiii)?Tr>CS. 
luirle And busllc. 
ovni instrumental. 
p.;iurce it more and malte it 

fliggB' 

4>'t*ofl't It Just Malte You 
’Wonóer. M» # l . Country 
nxk A bit thm. Falten il. (lf 
we use this mix). Should I add 
kmc reverta? 

■ 


Load-üut 

Mor = -7 dBRcf. RP2U0 

Sound comineo ts. 

Ttals recordIng is ADAT 
grainy Fun music. good Merco 
iiruge Somctimc* too much 
reverta- Nccds sotnc lullncss 

For 48/24 capture, used capí 
w/CDR w/no filters and no 

SCSSJOl- 

Scvsion: 

Rover Patch: Z Sys #9D. with 
K-Slerco. TC and Weiss. 

Roue: 

M3/M4 out to 

-K-S«erco. switched memories 
by scquencc to 

-TC. doing EQ and sometimos 
low level comp. Therc's a 
reverta module in the Chain bul 
if s nut used (all dry). 

Crancsong set to Tape 3 is 
inserted by automated routing in 
the TC 
To 

-Wc ss. varied with snapshots 
(L2 is bypasscd) 

Studio Vision sequencc 
changes the parameters. 


Kcvísions 
Rev.2 

He sent a dry versión of 
M.rg.irct s Walt/. Ik > I adde 
my “better" verb in the scaskxi, 
rcvrscd the settings for the TC 
6000 only. created a ncw 
sequencc. and captured just 
tr.Kk 5 to 48K. Tl»en SRC and 
inserí tnto the EDL 44.1K ver. 
2. Thcn mcrcavcd the spacc 
bctwccn tunes by 1/2 sccond 
and out to DDP with POW-R 3 
ditfcer. 


This (xcerpt from a Preparation Log Contains details ofLoad-in, Load-out, 
anttaftjr Revisions made. 


III. Logging 

Preparation Logs 

As we have seen, a multimedia project may have 
video elements, graphics, menus. etc. CD Audio 
projeets are usually a lot simpler. Here is a sample 
preparation log of a CD project, containing 
information about load-in. load-out, and any 
revisions. 


Every masteringengineer has a different 
approach, but the object of all logging is to be able to 


reconstruct what was done during the mastering 
session so as to make revisions or changes easier. In 
column 11 put my notes on the original sources 
(with client’s comments in quotation marks to 
distinguish from my own), column 2 is used for 
loadout notes, and column 3 for revisión notes. Of 
particular interest is the monitor gain which is 
logged. and the settings of the proeessors. Note that 
most of the digital processor settings are digitally 
stored in the processor’s memory and then saved on 
floppy disc or Sysex dump or other médium. If 
analog proeessors are used, we make verbal 
descriptions or pictures of the positions of the 
Controls (e.g.. "bandfour boosted 2 clicks at 4.7 
kHz, Q = 0.7”). In this revisión, since settings for 
the TC 6000 were changed fortune #5, the floppy 
disc for the TC contains two files, one labeled 
revisión 2, for a complete historical record. During 
loadout, 1 use a fully automated tcchniquc 
controlled by a MIDI sequencer; the only processor 
with a manual setting in the above master is the 
Cranesong HEDD, whose "tape” control has been 
set to position *5. 

At our studio, an automatic Computer network 
tape backs up audio logs and sequences as well as all 
the mundane Ítems such as word processing and 
accounting. Sinee Computer Systems and proeessors 
are evolving at Roadrunner pace, we also keep a 
high- resolution capture of the master just in case 
proeessors. applications or operating systems won t 
recover the oíd settings. Some clients are insisting 
on analog tape safeties, since this seems to be the 
only médium exernpt from the technioal 
obsolescence ironically known as progress. 


2 / No Mastering Engineer 

ls An Island 





LABEL: Boa Music 

TITLEs Alma De Buxo 
ARTIST: Susana Seivane 

CH Nn. 10002078 
NO EMPHASIS X EMPHASIS 
DIGITAL HEADROOM 0/0 dB 


DATE Novenber 24, 2001 
SOURCE DIG ANALOG X 
FORMAT DDP, v.1.0, PQ 9 Head 
MASTER X SAFETY 
SAMPLING FREQ. 44.1 KHZ 
MASTERING ENG: BK 


This master vías created on Sonic Solutions V5. All levels, fades & PQ times are Client 
approved. Please do not altar in ar.y way. Please refer all technical questions to 
Digital Domain at (407) 831-0233. 

UPC/EAN CODE í 0804071020727 


T- 

X 

TITLE/ISRC COPY EMPH 


NO OFFSET 

OFFSET 

OFFSET 

CD 





TIRE 

TIME 

DURATION 

TIME 





hh:mm:ss:ff 

hh:mm:88:ff 

hh:nun:ss: ff 

mn:ss:ff 

1 


ES6080132801 OFF OFF 

A 






0 

Pause 


-OCi 00;00¡19 

-00i00;00;29 

00:00:02¡00 

00:00:00 


1 

1/Vai De Polcas 


00:00:01:11 

00:00:01:01 

00:03:48:04 

00:02:00 






TOTAL: 

00:03:50:04 


2 

1 

ES6080132802 OFF OFF 
2/A Farándula 

A 

00:03:49:10 

00:03:49:05 

00:02:53:16 

03:50:10 






TOTAL: 

00:02:53:16 


3 

1 

ES6080132803 OFF OFF 
3/Sainza-Riofrio 

A 

00:06:42:26 

00:06:42:21 

00:04:04:02 

06:43:50 






TOTAL: 

00:04:04:02 


4 

0 

ES6080132804 OFF OFF 
Pause 

A 

00:10:46:21 

00:10:46:23 

00:00:03:14 

10:47:55 


1 

4/Roseiras De Abril 


00:10:50:12 

00:10:50:07 

00:03:59:23 

10:51:15 






TOTAL: 

00:04:03:07 


5 

0 

ES6080132805 OFF OFF 
Pause 

A 

00:14:49:28 

00:14:50:00 

00:00:02:02 

14:50:72 


1 

5/Xoaniña 


00:14:52:07 

00:14:52:02 

00:02:58:00 

14:53:02 






TOTAL: 

00:03:00:02 


6 

0 

ES60801328 0 6 OFF OFF 
Pause 

A 

00:17:50:00 

00:17:50:05 

00:00:02:21 

17:51:02 


1 

6/Rumba Para Susi 


00:17:52:28 

00:17:52:23 

00:04:28:08 

17:53:55 






TOTAL: 

00:04:30:29 


7 


ES6080132807 OFF OFF 

A 






1 

7/Vals Bretón-Muiñeira 

PicaOO:22:21:06 

00:22:21:01 

00:04:51:06 

27:22:00 






TOTAL: 

00:04:51:06 


8 

0 

ES6080132808 OFF OFF 

Pause 

A 

00:27:12:05 

00:27:12:07 

00:00:02:10 

27:13:15 


1 

8/Na Terra De Trasancos 


00:27:14:22 

00:27:14:17 

00:03:22:13 

27:15:40 






TOTAL: 

00:03:24:23 


9 

1 

ES6080132809 OFF OFF 
9/Muiñera De Alén 

A 

00:30:37:05 

00:30:37:00 

00:02:28:23 

30:37:72 






TOTAL: 

00:02:28:23 


10 

0 

ES6080132810 OFF OFF 
Pause 

A 

00:33:05:21 

00:33:05:23 

00:00:02:03 

33:06:55 


1 

10/Ti E Máis Eu 


00:33:08:01 

00:33:07:26 

00:03:11:20 

33:08:62 






TOTAL: 

00:03:13:23 


11 

0 

ES6080132811 OFF OFF 
Pause 

A 

00:36:19:14 

00:36:19:16 

00:00:02:00 

36:20:37 


1 

H/Chao/Xose Seivane 


00:36:21:21 

00:36:21:16 

00:03:00:28 

36:22:37 






TOTAL: 

00:03:02:28 


12 

1 

ES6080132812 OFF OFF 
12/Chao-Curuxeiras 

A 

00:39:22:17 

00:39:22:14 

00:03:25:21 

39:23:32 






TOTAL: 

00:03:25:21 


13 

0 

ES6080132813 OFF OFF 
Pause 

A 

00:42:48:03 

00:42:48:05 

00:00:01:26 

42:49:10 


1 

13/Marcha Procesional Dos C00:42:50:06 

00:42:50:01 

00:04:35:16 

42:51:00 






TOTAL: 

00:04:37:12 



00:47:25:15 00:47:25:17 


00:47:26:16 


PQ Listing showingengineer’s comments, track times, ISRC codes and otner information. 


Chapter i 


23 


PQ Lists 

The ñame PQ comes from the letter-code 
abbreviations for the information contained in the 
subcode of the Compact Disc. The P flag is the most 
primitive flag; ít changes state to indicate the 
beginning of a new track. The Q subcode contains 
information such as timing and program length, 
copy prohibit or permit, emphasis condition, and 
ISRC codes (see Chapter 30). most of which will be 
stored in the final disc’sTOC (table of contents). 
The written PQ log is actually a redundant iog, since 
nowadays the master médium contains all the tracks 
and an electronic versión of the PQ codes. In the oíd 
days,’ the replication plant would take the written 
information from the PQ log and enter it electron- 
ically into a PQ editor, since most mastering houses 
did not have a PQ editor. Today, while most 
mastering houses generate their own PQ codes, all 
responsible replication plants still require a written 
PQ list. This is the only place they can see the ñames 
of the titles, and engineer's comments. Mastering 
engineers appreciate good quality control 
procedures after the master has left their 
possession. A reliable plant will cross-checkall the 
information in the written PQ log against the 
electronic versión on the master médium, and cali 
the engineer if there are any discrepancies. An 
exceptional plant will even note noises they hear or 
over levels, and ask for engineer’s approval before 
pressing. Sadly, this hasbecome very rare. so the 
burden for quality control has fallen heavily upon 
the mastering house. 


• Not so longago. but Computer ycars are like dogyears, and in this fast -paced 
world. ten ycars fecls like seventy! 



























IV. Mastering Output Formats 

While we can accept recordings in nearly any 
format, only four media are suitable to be used by 
thereplicationplantforCD-Audio (CD-A): DDP 
(Disc Description Protocol, on Exabyte 8 mmtape), 
PCM i 63 o (on 3/4” video cassette), CDR (Orange 
Book. write-once media), or Sony PCM-9000 
Optical disc. As of this publication, the PCM-i 63 o 
format is rapidly becoming obsolete, and the PCM - 
9000 never took off and is also considered obsolete. 
Ofallthe above formats, the PCM-9000 was 
probably the most reliable. Almost as reliable and 
most popular is the DDP, which can be duplicated at 
4.X and greater speeds (not necessarily producing 
bettersound quality, see Chapter 19). The least 
reliable is the audio CDR, first because its error rate 
is not as good as the DDP, and also because it is 
easily susceptible to fingerprints and mishandling. 

A DAT is generally not considered a suitable 
médium for glass mastering, though one or two 
plants have adapted their systems to work with 
timecode DAT. The master must be recorded in one 
continuous pass, without stopping. under the 
control ofa Computer. Snme reeordingengineers 
attempt to deliver "inasters’' on CDRs recorded on a 
stand-alone CD recorder, but this is usually 
msatisfactory because of the inaccuracy of the track 
ooints, inability to put sepárate track end marks 
(which creates extra-long track times), and E 32 
errors ¡ntroduced every time the recorder stops its 
láser (breakingthe "one continuous pass" rule). 

CDRs make reasonable sources for premastering (I 
like them better than DATs), but not good masters 
lor glass mastering. 


'Audio CDRs make good sources 
mastering, but not good output 
masters. ” 


The master 
médium which may 
take over is the DLT 
(Digital Linear Tape). 

It has much higher 
capacity, typically 40 
to 80 gigabytes. In theoiy, the DLT can carry the 
DDP protocol, and could take over from Exabyte, 
but no one has implemented it.* 

DLT is the specificd médium for DVD and DVD A 
masters. Another up-and-coming médium is a 
CD-ROM or DVD-ROM with DDP image files, 
since the CD-ROM has excellent error correction. 
More masters are now being sent to the factory via 
high-speed Internet lines, which brings up legal 
questions of just what médium is the physical 
master. 



V. Picking the Right DAW 

By the mid-8o’s Sonic Solutions Digital Audio 
Workstations (DAW) had taken over the mastering 
field. As soon as engineers discovered the virtues of 
non-linear editing, and a workstation that could 
intégrate PQ codingwith audio, they quickly 
abandoned their slow Sony DAE- 3 ooo editors. 

Sonic workstations use a powerful Source-to- 
Dcstination editing model that many editors prefer, 
have extremely high data integrity (producing 
clones of the source when not processing), and can 
make those "impossible” audio edits through the 
use of a veiy flexible crossfade editor. The crossfade 

* Thcrc are aetually two vcraiona of the DDP protocol. versión j.o and 3.0. In 
addition, PQ code may be put at the head of the tape or at the tail. If put at the tail. 
PQ codes and ISRC can be changed without rewritingthe master. However, some 
plants do not accept PQ codes at the end of the tape. Check with the plant in 
advance if making anything other than versión 1.0, PQ at head. 


?3 No Mastering Engineer 
Is An Island 




editor is the inain reason why Sonic and its brethren 
are very popular editors in the classical music field. 
We’ll see the editor in action in other chapters. To 
this day, only a few other workstations or software 
programs have been qualified or dedicated to 
mastering: Audiocube, Pyramix (Merging 
Technologies). SADiE. Sequoia and Wavelab. SADiE 
has recently caught up to Sonic with converts, and 
Sequoia has garnered a good number of dedicated 
users. SADiE is now the only workstation to 
incorpórate a dedicated SCSI (hard disk) bus. which 
makes it very stable and free from operating system 
interference; you can even purposely crash 
Windows and SADiE will keep on cutting a CDR. 

The race is notyet won. since not each workstation 
has the ability to do multichannel and high sample 
rates with equal facility, and not all manufacturera 
offer an upgrade to DVD-Aand SACD authoring. 

Other criteria appropriate to picking a DAW 
inelude software and hardware reliability and 
economic stability of the company. Consider the 
number of man-years that have gone into software 
and DSP development and make sure that 
development is ongoing. Five man-years is the 
mínimum time I would consider required to make a 
pow'erful. dependable mastering program. Be wary 
of marketing promises: if the product does not have 
the fentures you want today, dont buyit on the basis of 

The Customer is Always Right. ” 

— Dale Carnegif. 


"real soon now. ” 
Find out if the 
company has fast 
and efftcient 
technical support. 
Ask if there is an 


upgrade policy. Another valuable approach before 
buying is to get feedback from users, especially 
those doing similar work. Is there an established 
user base and support group? 

All these criteria raise the short-term purchase 
price of a good workstation, but greatly lower the 
long-term cost of ownership. 

Don’t Be a Complete Bithead 

Far less successful are engineers who attempt to 
perform mastering on software platforms not 
specifically dedicated to mastering—largely because 
of lack of integrated PQ editing, low data integrity. 
low sound quality, inflexible editing, and so on. If 
you are goingto dedícate yourself to mastering, you 
must get a dedicated workstation. These 
workstations have other attributes besides data 
integrity: high calculation accuracy, which translates 
to low distortion. l'hey all implement proper 
dithering (see Chapter 4), and high precisión, 
with the highest precisión award going to the 
Audiocube (64-bit floating point) or Sonic 
Solutions HD (48-bit fixedpoint), whichyield 
excellent-soundingequalization and noise 
reduction algorithms. But don’t be a bithead, 
because all things are never equal; the skill of the 
programmer can tura everything around—one 
programmer’s 3 ?-bit float can sound better than 
another’s 64-bit (see Chapter 16). 

VI. Mastering Procedures 

Mastering With or Without a Producer Present? 

Mastering engineers are independent beasts 
and can master quite comfortably without a 


Chapter 


24 


oroducer or artist present. Once there was a certain 
y pe of mastering engineer who had a specific 
sound—if you went to that engineer, you would send 
your tape, and get her sound. But there are very few 
(if any) of those kinds of mastering engineers, and 
ihe reason is quite plain: every piece of music is 
unique. and requires a special approach that is 
sympathetic to the needs of that music and the 
needs of the producer and artist. 

Agood mastering engineer is familiar with and 
comfortable with niany slyles of music. She knows 
howacoustic and electrie instruments and vocals 
sound. plus she’s familiar with the different styles of 
music recording and mixing that have evolved. In 
addition, a good mastering engineer knows how to 
lake a raw tape destined for duplication and make it, 
sound like a polished record. Upon listeningto a 
tape, agood mastering engineer should be able to 
tell what she likes and doesn’t like ahout a 
recording, and what she can do to make the 
recording sound better. Then, by sympathetically 
listening to, and working with, the producer, the 
engineer can produce a product that is a good 
combination of her ideas and the producer’s 
intentions. a better-sounding product than if the 
engineer had simply mastered on her own. 

The best masters are produced when both the 
producer and the engineer solicit feedback, use 
empathy, courtesy, and understanding, and are 
villingto experiment and listen to new ideas. 

My approach is to welcome and encourage the 
producers' input for they are the ones most familiar 
with the music and what they want it to say. 


If the producer cannot attend the mastering 
session, then we’ll have discussions prior to and 
duringthe session of how they perceive their music, 
and how I think it sounds. Sometimes it helps if the 
producer sends in existing CDs as examples of their 
tastes. Then IT 1 send a reference or evaluation CD 
prior to the final mastering. Usually by that time we 
are enough in sync so there is no need to produce a 
second reference, or just some minor changes. 

Weeks or even months prior to the mastering 
session, an exceptional producer will send a 
preliminary mix to solicit the mastering engineer’s 
feedback, because there are things which are better 
fixed in the mix or not possible to fix in mastering. 
We don’t hesitate to suggest a remix if there is a 
severe problem. The better the mix, the better we 
look! How much can the sonics of a mix be improved 
in the mastering? I like to answer: about a letter 
grade, which can turn a B plus mix into an A plus 
master! 

The Mastering Workflow 

The mastering engineer's workflow comprises 
editing, cleanup. leveling, processing and outputto 
the final médium. Every engineer has a unique 
approach, using analog or digital processing or a 
hybrid. Currently, most engineers work with DAWs 
in very much the same way we worked before there 
were any DAWs 3 : First, we take the source for each 
tune (e.g., DAT, CDR, Masterlink, AIFF orWAV 
file), and process one song at a time. If that source is 
digital and if analog processing is to be used, we 
send it to a high-quality D/A converter, pass it 
through one or more analog audio processors and 


25 No MasteringF.ngineer 
Is An Island 


OIA Al 


Analog Equalizer Analog Comprensor 


i: 


□ 


:al Processor(s) 


r: 


□ 


Dither 


DAW 

I DAW — . 

Final Médium 







Variatlon 1: All Processing, fadeins. tadeouts. are performed on 
k>adin through various analog and digital processors and the 
result is recorded as a 16 bit/44.1 kHz file on the DAW. then 
prepped for a last step. output to final master médium. Not all 
of the illustrated processors may be used. and the fader may be 
in a mast e ring consolé _ 


DAW — D/A Analog Equalizer — Analog Compressor 


Analog Limitar A/D Digital Processorfs) - 


J 



Vanation 2: Same as Var. 1. except the source has beer 
preloaded inlo a "source' DAW. All processing, fadeins, 
tadeouts. are performed while creating the second DAW file al 
16 bit/44.1 kHz. then spaced and prepped for output to final 
master médium. Fadeins and tadeouts may be performed either 
in the source DAW or in ene of the exlema! processors. 






Variation 4: In thls variaton. all digital processors are used. 
And the signal is upsamped to a higher rate for the lewest 
distortion, then downsampíed and stored as 24 bit/44.1 K on 
the DAW for high resolutón archive. In the last step. dither Is 
added for cutting to a CD master, In !hi$ variatlon. auiomation 
can control the parameters of the processors. and thus the 
entire CD can be created rn reaf time wrth everyfhing 
’non-destructive". to pernit easy revisions later. 


Chapter i 


2 6 


possibly control the level, EQ, or fade via a 
customized analog masteringeonsole. The signal is 
then passed to a high quality A/D converter, 
optionally through various digital processors, 
dithered to 16 bits, and then recorded into the DAW. 
Wc then move on to the next song, resetting 
processors until the best sound is achieved for that 
song. And so on as illustrated in Variation i of the 
figure at left. 

In this variation, all leveling, fading, processing 
and equalization has already been accomplished, 
and the DAW is only used for assembling and 
spacing, which is a very efficient approach. When we 
reach the end of the tune, if it requires a fadeout and 
we missed it, instead of reloadingthe entire song we 
may back up befo re the fadeout, do a simple punch - 
in on the workstation, perform the fade, and then a 
matched edit. Chapter 4 tells why this 16-bit file 
should not be further processed. What this means 
is that, in Variation 1, if the client orders any 
revisions, the engineer must repatch the entire 
chain, reset the processors, make any processing 
changes, and re-record/replace the oíd destination 
file with a new one. 


Often there is no real time 'Toad in,” since 
sources may arrive as high resolution or high 
sample rate Computer files on CD ROMs or other 
inedia, and can be loaded at high speed directly into 
the workstation (Var. 2). The mastering engineer 
then has to listen to each tune to get the feel of the 
whole álbum and check for noises or other problems 

that may need fixing. She 

ti left: Infinite Variotions on a begin by putting the 

Mastering Theme. Four exampies of J J 1 u 

approaches to audio mastering. 














































































































material in order, cleaning up heads and tails, 
perform fadeouts and spacing, and then proceed as 
inVar. i. except she uses the workstation as the new 
"soune” as well as destination. In Variation 3 , the 
mastering engineer waits until the final output to 
dither, which gives some flexibility to perform fade- 
ins ar.d fadeouts on the final DAW file and perhaps 
some leveling. (Although most of the leveling 
should have been performed beforehand to avoid 
cumulative loss of resolution). After digital limiting, 
levels cannot be raised, only lowered. 
and equalization should not be performed on a 
previously-limited signal, as the peak protection 
will be undone. Digital filtering of any type can 
cause overloads on a digitally limited signal, because 
it creates higher-level intersample peaks. Thus it is 
best to return to the source and reprocess in order 
to change levels between tunes. 

With the increasingnumber of high sample rate 
projects, another variation is to use two 
workstations. one to play back the high sample rate 
material, the other to record a sample-rate- 
converted and dithered output for CD prep. Yet 
another variation is to use upsamplingfollowed by 
downsampling (Var. y) . Even if the source material 
isready for CDat 44.1 kHz, it is well-knownthat 
digital audio processing and conversión at a higher 
rate sounds better (see chapters 16 and 18). The 
engineer may reproduce the source material at the 
lowerrate, feed an upsampling sample rate 
converter (abbreviated SFC, SRC), then perhaps 
D/Aconvert usinga high-resolution, high sample 
rate D/Afor analog processing, then record the 
material into a high sample rate A/D converter for 


optional further digital processing, then finally 
downsample and dither (if the result must be 16- 
bit). If the source material is at 44.1 kHz, a CD can 
be cut in real time using this chain. But two steps 
(and two DAWs) may be necessaiy if the source 
material is not recorded at the target rate, since 
most DAWs can only work at a single rate. First, the 
material is stored at 24 bits/44. 1 K on the new DAW 
file, then it is dithered inthe last step to the 16-bit 
master médium.' 

Material that arrives at múltiple sample rates 
(different songs at different rates) is particularly 
problematic, often necessitating sample rate 
conversión to a common rate beforc the mastering 
canget started. 

Tune by Tune or Fully-Automated? 

All of the above descriptions have one thing 
in common: they follow a tune by tune approach 
to mastering, i.e. master one tune, reset the 
processors, then move on to the next one. Although 
engineers have been making excellent albums using 
this method foryears, an increasing number of 
digital audio processors are remóte-controllable via 
MIDI (Var. 4), which permits them to be automated 
and thus completely integrated with the workílow. 
Most engineers already use some sort of automation 
in their work, since advanced workstations provide 
automated equalization, leveling, fades, dynamics, 
and even automated plug-ins. If a revisión is 
requested, the mastering engineer can save the 
previous EDL (edit decisión list) and instantly make 
changes in the amounts or timing of the 

* Oneuniquc workstation (Sequoia) permits working at two rates simultaneously, 
so only one workstation is needed! 


27 No Mastering Engineer 
Is An Island 



workstation’s internal equalization. The MIDI 
technique extends this ability to the outboard 
equipment. For me this is a revolution—finally I can 
work with the álbum in the making in a comfortable, 
fluid, non-linear manner. I work with a songuntil it 
is cooked, save the parameters in the memories of 
the processors, and then move on to the next song 
without having to capture to a DAW file. I save those 
parameters in another processor memory, then 
return to near the end of the previous song and play 
the two together with the MIDI automation 
following along, nondestruetively. This makes it 
easy to intégrate two dissimilar songs, e.g. if one 
ends bigand the other begins soft and easy (more 
details on this technique in Chapters 7 and 10). It’s 
also non-linear—having the context of the whole 
álbum in development makes it possible to revisit 
and reprocess any portion of the álbum. For 
example, we may make a great climax, then recheck 
the first song in context and reprocess it if necessary 
without having to reload or recapture. Full 
automation also permits special effects—for 
example, as we approached the climax on one tune, 
upon the entrance of a bigvocal chorus, I created 
MIDI-automated changesin the K-Stereo Processor 
that increased step by step the spaciousness and 
depth, producing a gigantic sound in the final 
chords. After we’re satisfied that the álbum sounds 
good, we then go back to the beginning and cut a 
CDR reference in real time with full automation. 

The biggest advantage of full-automation is the 
ease of revisión, especially if you have a critical 
clientele. Processing is always applied in a non- 
destructive, non-cumulative manner; anything can 


be undone without going down another generation 
or forcing a reload. Another advantage of this 
method is that the raw, unaltered sources can be 
immediately compared with the master and 
demonstrated to the client. We tiy to ensure the 
master is better than the source in eveiy possible 
aspect; it’s a soberingmoment if we discover that 
the source is better than the processed master, in 
which case—back to the drawing board! 

One sonic advantage of this method is that the 
highest resolution processor can be used to change 
gain. Thus, the MIDI automation accomplishes the 
changes in levels from song to song. I often use the 
output gain of the mastering compressor, since the 
40-bit float Weiss has a more transparent-sounding 
gain change than the DAW or any other device in the 
chain, and this also avoids additional DSP. The 
biggest disadvantage of this method is the amount of 
technical know-how and concentration required to 
run a MIDI sequencer and control the parameters of 
external equipment. 

Here’s howthe MIDI-automated chain is 
liooked together: 

The audio resides in the mastering DAW on a 
PC, for example, SADiE, which feeds a series of 
external rack processors, and returns back into 
SADiE. With SADiE or Sonic Solutions, the CD 
master can be cut in real time using this routing if 
the source audio is at 44.1 kHz SR. 

The timecode master is SADiE, and this 
timecode feeds another Computer, in this case, a 
Macintosh running a sequencer called Digital 
Performer. 



T'i*r‘ 


File Edil Región Basles Chanye Audio Windows Help 


Wed932AM £¡ <£ W Digital Perfumee" 10 ^ 


** %ÉÉ I 

MOTU ene; « 


fií «¿y «cu '¿jjy «jü Qu *¡y Qfl ü& úd¡u 

Mwxrx s<yt MIÓCO Xwi aiPOO J ProjKt JTflMnifJ tos j Wii» ¡"O 3 


5*kc«w> Surtí iMWO £¡íl 


'"KOiltKStt 31 


«4 MMinMaaiiBi 

«'Conductor 

.' 

ai 

CMNMH 

1 r p ««i* 

* Aux-1 


• 


||i— r TC6000E1/Ct>8 

> re 6000 El 

Píte* 9 

46 

£0 

1- TC6000f2/Ch? 

A TC 6000 E2 

P*tc4 40 

• 

LowComp 

; TCóooon/chio 

> 7C 6000 E? 

P*ch 61 

* 

VerP 

I' re 6000 £4/0.11 

IH » TC 6000 fiovttnq/o* 13-**rv Soe>-t3 

>rC6000£4 
2 TC 'ovits: 

P *tch 3 
P**»0 




MMM 


0=00=00=00 


1111000 


? TV«c* t 


Aux-l: C4 (versión M)) 

C BVPASS 


Wri 


2 9Q 90 00 


2:10 00 00 


220 00 00 


2 50 00 00 


2 M v X 


250 00 00 


5 00 00 00 


3 1000 00 


3 20 00 00 


3 30 00 00 


A- e LOM :-*v« 7 V/52 

Ou’pu» 


M-: 4000 M*. 11071 O 0(0 


ÍTW 

00 r 

I G*r* 

R*4i44 




riwuii 

00 | 

00 

: tjr*r 


12 » 

j 127 34 

: 

Opto 





193 67 


i ¡ J t 

030 


10.00 00 00 3 Tr«* 1 

4*2 >0 

4*1 O 
4*0 10 

1 2;00.0000 jTr*ei 2 

*•2 ¡30 

4*1 iO 
A «O ¡O 
| 2:10 :00 00 ,jTrKk3 
4*2 ¡O 
4*1 <0 

4«0 O 
I 2 20 00OO (JTrsck * 
4 «2 -O 
4*1 O 

■ 4*0 10 
|2J»00003Tr*et3 

4*2 O 
4*1 (O 

■ 4 «O 10 

¡2 40 00 00 JTr*cA 6 


?Tr»ek 2 
l»2 Píte#» 2 
9Tr*ck 3 
•--3 
tJTr*ok4 
Si *4 P»t<* 4 

^Tr*ck5 
1*5 P*tc* 3 
9Tr«*6 
%*6 P«<c4S 
3Tr*eW 
«*7 Píte* 7 
3Tr«ck8 
%*c P**ch 3 
'jTr*c* 9 
%*9 P*ch» 
9Tr*ok 10 
«•10 PfteAIO 
?Tract 11 


Digital Performer in action, slaved to the mastering DAW. Performer a a toma tes both externol device s vía MIDI and plug-ins actmg as outboard pr ocessors to the 
tnain norkstation, for exomple, SADiE. 


The MIDI instructions are fed í’rom 
Performer to the external rack processors, and in a 
cute trick, automate a native plug-in, the Waves C4, 
¡mplemented directly in Performer, illustrated in 
the above figure. We treat the C4 functionally as 
anotherrack device external to SADiE. Native 
processors are not always used in mastering. but I’ve 


created this illustration to show how it can be done 
even when the mastering DAW does not support 
"live” plug-ins. 


¿t) No Mastering Engineer 
18 An lsland 
























































Vil. Media Verification, 
Archiving/Backups 

Listening Quality Control 

At the end of the project, the art of mastering 
has to tura back into a Science. In larger mastering 
studios, this is performed by a sepárate QC 
department. The QC engineer must have 
musical/artistic ears, technical prowess, but also a 
lot of common sense: the project has already been 
auditioned by the mastering engineer and producer 
and all the noises presumably were accepted, 
perhaps even welcomed as "part of the music.” 

If a single unacceptable tic or noise is discovered 
anywhere in a master, the entire full-length master 
has to be remade and listened to/evaluated. There is 
no shortcut. During the QC pass, we have to utilize 
as many objective criteriaas possible. For example, 

a eritical listener 
using headphones 
is bound to hear 
more noises than 
someone using 
loudspeakers. 

Does this mean 
that we have to use headphones to verify a project? 

If the monitoring acoustic is less than ideal, then QC 
must be performed with headphones, but the 
loudspeakers in a critically-designed mastering 
room are more than adequate. Mastering engineer 
Bob Ludwig has reported that headphone listening 
becomes essential when the number of channels 
multiplies. Potentially embarrassing noises or 
glitehes hidden in the surround channel when 


"If a single unacceptable tic or noise is 
discovered any'where in a master. the entire 
full-length master has to he. remade and. 
listened to/evaluated. There is no shortcut." 


auditioned on loudspeakers become quite audible 
when that channel is isolated in a pair of 
headphones. To complícate the situation even 
further, one consumer may be auditioning all 
channels using surround headphones whilc others 
will be hearing stereo reductions (folddowns). 
Clearly, we have to give much greater attention to 
detail, and costly time to evalúate a final master in 
surround, even to the point of requiring 3-4 hours 
to QCan hour program, including any extra passes 
ncccssary to chcck a folddown! Wc than have to 
decide how to deal with each noise that is 
encountered during QC. We follow the practice 
of noting the timecode of each offending noise. 
then checking with the mastering engineer 
and/or producer to see if the noise had already 
been accepted. 

QC also ineludes verification that the proper 
songs are in the proper place, based on client - 
supplied lists of the song lengths, lyric sheets, etc. 
We must ensure that the correct master goes out for 
duplication, and must be especially waiy of 
misidentifying individual CDs of a múltiple-CD set. 
With the advent of authoring and DVDs, more than 
one QC may be needed, including the final 
watermarked and MLP'ed' master. And with 
electronic delivery comes the legal issues of which 
"physical master” has been officially evaluated. 

Objective Media Verification/Error check 

Digital media are susceptible to data dropouts 
which cause errors, which is why all the digital audio 
storage formats, DAT, Exabyte, PCM-i63o, and DLT 

• M l.P is Meridian Lossless Packing, aee Ghapter 15. 


Chapter i 3o 



tapes, and optical dises, CDR and DVD-R, utilize 
error correction algorithms.’ Uncorrected errors 
result in glitches, clicks, and other noises. 

NormaUy, when playing a digital tape or disc, we do 
not know the amount of error correction which is 
occurring. It can sound great, but the tape or disc 
could be near dying! If the error correction system 
is working veiy hard, the next time that tape is 
played, a speck of dust or head alignment problem, 
orsimply wear and tear, will cause a signal dropout 
duringplayback. Our job is to look behind the 
scenes usingspecialized measurement tools. 
Iistening alone is like having a doctor look at the 
patient without takinghis temperature. So media 
verification is a thorough internal examination. 

There is also the issue of error concealment, 
which is the last defense mechanism in digital 
playback. If the error correction does not work, 
that is. if there is an uncorrectable error, then the 
playback machine uses an interpolator. The 
interpolator looks at the audio level before and 
after a dropout and supplies an intermedíate 
replacement. If performed well, error concealment 
can sound very good, but professionals never 
use a master médium that is so degraded. On the 
PCM- i63o, error concealment can be turned off, 
and the result is an audible mute that purposely 
lasts a second or more, to cali attention to itself. 


• Hard disc*. howcver, generally do not require error-corrcction, since their 
error rales are extremely small. 

t On the contrary. all the nuil test proves is that there were no uncorrectable 
errors. hnt it is noi a measure of media reliability or error-count. The nuil test 
is pos! the error correction. You could be one bit away from failure and not 
know ii.Thc next time an error-prone disc playa, there could be an interpo- 
lation ora mute if the error count is high. Thanks to Glenn Meadows for 
pointingout thesc faets. 


The PCM-i 63o system uses an evaluator known 
as the DTA-aooo, and each plant and mastering 
facility decides on objective criteria for accept- 
ability. For example, some houses reject tapes with 
CRC (Cyclical Redundancy Check, correctable 
errors, aka soft errors ) counts over 50 in any minute, 
or over 200 total on any tape. Other houses accept 
up to 3oo or even 400 CRCs in an hour, though this 
is considered exceptional or rare, and an indication 
of poor master tape quality. Of course, any 
uncorrectable error is cause for rejection of a 
master. 



MyTH: 

An audio 

loadback/null test 
shows the integrity 
ofa CD Master. 1 

I_I 


filis 

TBST2, 00 

1 

00, PMKAST, 




Test Start C-Tiie 
Te*t Stop 

0 01 52 
AA 00 15 

A-7ite 

0 00 

79 42 

KtiSk. " s ¡ 

Paraieter 

valué 

C-tiae 

A-ti« 

Ava 

Thr 

Cnt 

BLEP. 1 Sec Max 

29 

1 00 18 

0 20 

2.5 

220 

0 

10 S«c Max 

1? 

1 52 3C 

52 32 


220 

0 

1-11 1 Sec Max 

17 

1 75 03 

75 05 

0.9 

203 

0 

10 Sec Max 

2 

1 03 07 

1 09 


200 

0 

1-21 i Sec Max 

9 

1 

HB 

1 45 

0.4 

200 

0 

10 Sec Max 

2 

1 

5 09 


210 

0 

B - 31 1 Sec Max 

20 

1 

2 27 

52 29 

1.2 

200 

0 

10 Sec Max 

13 

1 

2 29 

52 31 


200 

0 

BSS7 1 Sec Max 

10 Sec Max 

3 

3 

I 

11! 

ilü 

0.5 

7 

7 

I 

B-12 1 Sec Max 

237 

1 52 27 

52 29 

8.6 

300 

c 

10 Sec Max 

155 

1 52 20 

52 32 


300 

c 

B-22 1 Sec Max 

12 

1 03 45 

3 47 

0.0 

15 

c 

10 Sec Max 

1 

1 03 45 

3 41 


15 

c 

S 32 1 Sec Max 

0 




0.0 

1 

0 

10 Sec Max 

0 





1 

0 

in / ltop Hit 

0.580 

1 00 12 

0 15 

0.596 

0 600 


13 / 11oo Mié 

0.277 •* 

0 02 10 

0 00 

0.329 

0 300 


Max 

0.354 

1 79 14 

79 1( 


0 700 


SW Míe 

•12 

84 52 

64 55 

•8.5 

-20.0 


Max 

•5 

0 02 51 

0 00 


20.0 


lió Meíxe Max 

18 

1 51 55 

51 57 

12.1 

30.0 


PP Maq Mía 

0.088 

1 00 00 

0 02 

0.086 

0 oto 


Max 

0.056 “ 

1 00 00 

0 02 


0 0’0 


Crosx talk Max 

50 

1 G0 00 

0 02 

*9.5 

50.0 



CD-A fieport from Clover Brand Analyser. Note the BtfR valué of 29 in any ¡ second máximum at :20 Abs time. 


3i No Mastering F.ngineer 
Is An Island 














Exabyte drive reports: retry 
level of 0 percent, and an error 
level of 0.0443386 percent 

This error rate is within the 
factory standard for a new 
drive. 

Delivery Job Complete 


Exabyte Erro r Report from Sonic Solutions showing the total error rate for the 
duration of the tape. 



C0201032 - 20011028.txt 


Audio Lengti 

conststsnt 

bstwssn subcode 

and sapstr* 

un data - 33837 

B CO rrtMi 





SO !P0 descr 

fils - 960 bytes read. This is 15 blocka of size 64 

bytsn 





Cxpsctad no 

packet* ■ 15, Aetna 

nc packet-i 

- 15 

Co®p«ring PQ 

List t *«d 

,n fros pop i**diust with list 

ln mefltery. 

Verificar ion 

sucosasful 

lista ara idsnttcal 


Auoio riuc 





Verlfyine «odio ir.tsgriey... 



block 

block 

Elapsed tiste 

kilobytss 

soft srror 

sizt 

count 

(emiss.nsecl 

per seccnd 

count 

9408 

56 

00:00.109 

4720 

0 

9408 

56 

00:00.285 

1805 

0 

9408 

56 

00:00.514 

963 

0 

9408 

56 

00:00.588 

875 

0 

9408 

56 

00:00.543 

94 7 

0 

940» 

56 

00:00.589 

873 

0 

940» 

56 

00:00.570 

902 

0 

>90» 

5S 

00 : 00 . 

y*2 

0 

940» 

56 

00:00.586 

877 

1 

940» 

56 

00:00.544 

945 

0 

940» 

56 

00:00.589 

873 

0 

940» 

56 

00:00.563 

913 

0 

940» 

56 

00:00.546 

942 

0 

9408 

56 

00:00.585 

879 

1 

940» 

56 

00:00.571 

901 

0 

940» 

56 

00:00.564 

912 

0 

940» 

56 

00:00.569 

904 

0 

940» 

56 

00:00.544 

945 

0 

9408 

56 

00:00.588 

875 

0 

9408 

56 

00:00.545 

944 

0 

940» 

56 

00:00.589 

873 

0 

9401 

56 

00:00.569 

904 

0 


Comprehensive Exabyte Error Report from SADiE System, showing e r rors at each 
block, whichgoes on for 30 more pages '. The plant is satisfied with a one page 
graphic summary showing the total count of errors and that no large error 
amounts occur in any short period. 


Chapter i 3 z 


The Clover system is a popular CDR media 
evaluator. The most critical criterion for CD-A and 
CDR quality is called BLER (Block Error Rate). A 
very good CD can have a BLER as low as 10, yet CDs 
will still play with BLERs of 1000 or even above— 
which illustrates just how robust the error 
correction system is for CD-A. CD ROMs use an 
additional layer of error correction. One conscr 
vative mastering house’s standard rejects any CDR 
with BLER over íoo, or any CDR with an E 3 z 
(uncorrectable) error. 

For Exabyte tapes, reports can get as complex as 
a multipage document showing error count in each 
block, or simply a one paragraph total report 
indicating error percentage (see figures at left). 
Many mastering houses will reject Exabytes with 
errorpercentages over 0.1%, though 0.2% oreven 
o. 3 % error is quite acceptable, as long as there were 
no read-after-write retries in the error report. 

Other pC Issues 

The responsibility for QC must be accepted by 
someone, but the movements of technology and 
economics are makingit difficult to guarantee 
standards. The PCM-i 63 o has obtained legendary 
status for its sonic quality, and it also forces glass 
mastering to be at 1X speed, where the master may 
be auditioned, thus gaining one critical stage of 
Quality Control. However, the i 63 o technology is 
now oíd enough to be causing concern about its 
reliability and many plants copy from i 63 o to 
Exabyte to avoid problems during expensive glass 
mastering. 





There is usually no press proof except when vety 
large quantities are pressed. There used to be a 
listening room at each pressing plant where masters 
ve re auditioned prior to glass mastering. But now 
vhen the master arrives at the replication plant, 
vhether in physical or electronic form, it will likely 
be copied high speed to an Exabyte tape or to the 
(actory's central server, and there is no auditioning 
during glass mastering. The day has come when the 
borne consumer is the first person to audition the 
product! Eveiy project needs a Mothership to get 
through this mess.’ 

Since human QC at the plant seems to be 
cecreasing, cspecially for electronic delivery, I 
propose that the approved electronic delivery have 
an error-detecting format built-in, as used by 
programs like ZIP for the PC and Stuffit for the Mac. 
On opening, an error will be generated if a stuffed 
file does not contain the identical data that was used 
to create it. Using such a coded master can confirm 
that the file remains intact through all transfers up 
to the point of glass mastering. The Meridian 
Lossless Packing format (MLP), used for the DVD- 
A. is a self-correcting médium, but its cost and 
encodingtime make it overkill for simple stereo 
work. 

Backups/Archives 

After a project is finished, we wait until the 
client has approved the master (usually by listening 
toa copy of the master). We then may wipe the 
material from our hard dises, but not before saving 

' Thanb to Mike Collins, One To One Magazine. November 3001. and to various 
dismssions on the Mastering Webboard, for inspiring this section. 


the logs on hard disc with all the material, and 
makingan in-house audio baekup on some form of 
Computer tape. The in-house baekup is mostly in 
case a revisión is requested within a reasonable 
time since as we mentioned, digital technology is 
constantly changing. Some record labels require 
full backups of the masters, often on Sonic 
Solutions Exabyte tapes, or some other acceptable 
archive format. 


The critical 
difference 
between a baekup 
and an archive is 
that an archive is 
made to a médium 



Backups? We don 
need no ba&*g u. 


t 



which is supposed to last a longtime (3o years or 
more). However, I wish good luck to those who have 
to decipher those multi-formatted ones and zeros; 
will the equipment still be around to read them 
even ten years from now? Computer manufacturen 
seem bent on obsolescence and equipment turnover, 
which makes the idea of full data-recovery 
frightening. Technological evolution is a serious 
issue. 


33 No Mastering Engineer 
Is An Island 



1 I am rcmindcd of an analogous situation in the film world. Prior lo 1977, the role 
of Sound Dcsigner was unhcard of. but Ben Burtt rcccivcd the hor.or of the title 
on the first Star Wars film. The Sound Designer is the Mothership for the entire 
film, coordinating the film from first recording, through transfers, editing, and 
the final mix. As a rcsult. the film takei on a flowing. gcstalt feel. 

2 One mastcring enginccr rcportcd a sitial ion whcre anothcr house added the CD 
ROM portion to an extended CD, and somehow in the process, changed the audio 
quality of the audio portion. Never assume that everything will be fine when the 

mastergoes out the door. even to the ertent of (on critical projects) approvmgand 
testing the final product. It is possible to do nuil tests or bit for bit comparisons 
which compare the original audio master against the final pressing. assuring that 
the audio cata had not been altered aftcr it left the mastcring house. 

In anothersituation, a less than reputable plant copied all incoming masters 
using a consumcr-based program which automatically shortens tracks to the end 
marks. and then puts 2-second silent gaps between all the tracks. Thus. the final 
pressing of a bcaut ifully cngincered livc concert sounded like it was edited with 
an axe! These are real horror stories from the trenches. so be sure :o mind your 
Q*s and C*s! 

3 Well, this is true for CD mastering. But if you go way back to the ages of LP cutting. 
the cutting cngineer was forced to cut ¿n entire record in one continuóos pass. If 
you stop, you créate a locked groove, which you could say was yesterday's E 32 
error. A sophisticated LP cutting engireer would note settings for each tune and 
manually change her processors during the banding between each track. 
Equalizers were developed with A and B settings. allowing her to press one switch 
during the intertrack gap, and then leisurely preset the opposite equalizer for the 
next track. Primitive. but roughly equivalent to the fully-automated process which 
I use today. 



CHaPTer 2 


Connecting 

ItAll 

Together 


The Principie of Consistent Monitoring 

The following page shows a block diagram of the 
audio connections in the ideal digital audio 
mastering studio, The heart of this studio is an 
integrated A/D/A system (T), typically 6 to 8 
channels. Since our clients expect us to make 
consistent quality judgments, we audition all digital 
sources and pressed media through this single 
converter. Unfortunately, this principie of 
consistent monitoring has been subverted by the 
advent of new copy-protected media such as DVD-A 
and SACD, whose players do not have digital 
outputs; thus it is not always possibleto proof the 
final product through the same D/A converters that 
were used for the mastering. 

All channels of the A/Ds and D/As are housed in 
the same chassis, with internal clock connections 
designed for mínimum jitter and immunity from 
external jitter. In Chapter 19 we will learn why this 
is the best architecture for mínimum jitter. With a 
jitter-immune system, the mastering engineer 
avoids chasing ghosts and non-problem problems. 

Routing It All 

The router ( 2 ), switches all digital sources and 
destinations in any combination. A 16 x 16 router 
can be used in a smaller studio or one dedicated to 
stereo production, but at least 3 g x 3 s¡ is required 
for surround work. The Z-Systems brand of routers 
can switch virtually any type of signal and support 
múltiple sample rates and different synchronizations 
in the same chassis, can be configured for different 
voltage and impedance standards, and thus can 
be used for AES/EBU or S/PDIF (2 channels per 


35 



Block Diagram ofA State of the Art, Jitter-lmmune Digital Audio Mastering Studio. 


Digital Sources and destin- 
ations 

(CD, DVD, DVD-A, DAT, 
Masterlink, DAW, 

8 track, possibly 24 track, 
etc. 


32 x 32 
(64 ch. AES) 
remote- 
controlled 
router 


♦ 




Digital Monitor 
Selector 





From analog processors, 
tape playback, turntable 
playback, etc. 


Digital 

Processor (one 
of several) 


Bitscope/Digital 

Meter 


Chapter 2 


36 



a master dock buss feeding 
all converters 


LEGEND 

Wordclock (one or more) 
Analog (one or more chs.) 

Digital (AES/EBU, S/PDIF, 
etc.) (one or more) 

















































connection for a total of 64 in and out at any 
standard sample rate) or Dolby E (8 channels per 
connector) or Dolby Digital (6 channels), MADÍ 
(múltiple channels) or encoded formáis such as 
MP 3 .evento distribute wordcloek. Possible sources 
anddestinations inelude DAW(s), tape, CD(R), 
digital compressors, equalizers, A/D and D/A 
converters, and so on. One digital source can be 
routed to múltiple destinations, but any digital input 
can only accept a single source. 

Complex chains witli analog or digital 
components can be created at the push of a button, 
since the analog processors are connected to the 
converters, and the converters are connected to 
the router. For example, this figure shows the 
Macintosh computer-based remóte control for a Z- 
Svstems 16x16 router. 



Mac-bosed remóte control for 
al-Systems 16x16 router. 


Individual setups can be saved and named for 
each project. For example, in this project, a stereo 
loop begins at the DAW and returns to DAW: Sonic 
Solutions M3/M4 feeds the Z Systems digital 
equalizer, which then feeds TC System 6000 inputs 
1/2 for further processing, then to POW-R dither, 
and back to Sonic inputs L.3/L4, where they are 
routed to the SCSI CD recorder or master tape 
machine. This router setup also handles 2-channel 
monitoring, and provides an auxiliary loop path to 
and from the DAW and a reverb unit (the Sony V77). 

In my mastering studio, the TC System 6000 
functions as the central A/D/A converter, calihrated 
digital monitor level control, Folddown 1 control, 
master dock, and insertion between digital points 
and analog processors. In other studios, some of 
these functions are relegated to the analog 
monitor/Iine - stage 
preamp ( 4 ), which 
follows the monitor 
DAC, but my line- stage 
preamp just serves to 
check the direct sound o: 
analog-only sources 
(such as turntable or 
tapedeck). 

The digital monitor 
selector ( 3 ), is a smaller 
router (8x8 
recommended) which 
takes any subset of the 
32X32 and routes it to 
the monitoring DAC. 


TAS. O* »V« 


— 

aar-A 


r “ .7. 

1 • - 

— # . -=r~- 

»f»*l ••arto* n¡ "T.T“ 

• 1 • £ • o • o o o~b 

4» m - Jt - - 

H frx.. 




¿¡ — - o ss oa¿ 

|2 

JL • " 

• . - - -S ••• ' . • 

* »■"' — #ís am 

A 

* ■ , 

■ 

m. 

- - | 


The top component inthis rach is aTascam DA-4), 24-bit DAT machine, below 
which is the Digital Dcmain model VSP, which seleets from 6 digital sources for 
recordingordubbing, and 6 sources for monitoring. An A/B monitor selector 
allows for comparisons. Below the VSP is a Viaves L2 digital limiter, below which is 
the front panel ofa remote-controlled 2-Sys 16x16 router. 


3i Connecting It Al! 
Together 








































































■ This allows A/B 

monitoring or comparison 
of any two digital sources, 
such as "before” and 
"after” mastering. Digital 
Domain manufactures a 
Digital monitor selector 
called the VSP (see photo 
on page 37), that allows 
instant A/B selection of 
any two stereo sources, 
and can preselect from 6 
choices. 

24 Bits active on the bitscope. 

Normally, the 

converters períorm best on internal sync, but when 
doingvideo, the converters must slave to the 
wordclock which comes from the NTSC to wordclock 
converter ( 5 ), and we have to depend onthe quality 
of the converter’s PLL to reduce jitter, explained in 
Chapter 19. A wordclock distribution amplii’ier (6), 
feeds múltiple wordclock lines to the DAW, DAT 

machine, CD transport, 
and some processors 
which support wordclock 
input. Otherwise, we must 
depend on AES black or 
signal-carrying AES to 
synchronize the ancillaiy 
digital gear. 

Other important 
equipment ineludes ( 7 ), a 
bitscope (see photos this 
page) and digital meter, 

16 Bits active on the bitscope, truncated after theLSB. 



which can be routed from any digital 
source. The bitscope serves to 
double-checkthe bit-integrity of 
the source, confirm that dither 
appears to be functional, and that 
there are no extra bits due to 
hardware or software bugs. 

I usually connect the meter and 
bitscope to the same router output. 
Pictured are two examples of 
common meters used in mastering. 

I. Block Diagram and Wire 
Numbers 

When constructing a mastering 
studio, begin with a detailed block 
diagram, inserting wire numbers 
from a sepárate wire number list. 

On the opposite page is an example 
block diagram, with wire numbers 
in parentheses. 

Proper grounding and wire 
layout techniques are critical. 2 A 
modern-day digital mastering 



Mytek digital Meter DDl 
with 96 kHz upgrade. M» 
only respondí to top 16 
ofsignal. butindicates 
counts overloads with c 
clever counter. 


«r# 

Le ■ xJneso Mor# tor 

___ dorrough __ 


Dorrough Loudness Meter. It’s extremely useful due to the dual-scales, but I 
quibble with calling it a "loudness monitor/' since it does not correlate with 
loudness any better than a standard Vil meter. 


Chapter a 38 






studio may contain only a few analog processors, so 
it is easy to put all the analog gear physically 
together in its own rack, at a distance from dock 
interference. Analog gear used for mastering can be 
customized for minimalist signal path, removing 
transformers and superfluous active stages, 
something which is not advisable in a large analog 
studio where ground loops are more difficult to 
ehase down. I avoid analog patch bays, as they only 
deteriórate over time and their small contact area 
contributes to contact-resistance-distortion, 
preferringtouse instead individual short 
interconnect cables. 

Some mastering studios have constructed 
custom mastering consoles, which insert analog 
elements at will. My approach avoids a mastering 
consolé, since all the analog gear is patched 
manually. and the monitoring functions are 
absorbed by a custom-built analog monitor selector 
and level control. Eveiy mastering engineer has his 
own variation on these themes. The digital 
equivalent of a mastering consolé is accomplished 
by a combination of the Z-Sys routing, the digital 
monitor selector, plus the TC System 6000, which 
has internal stereo and 5.1 processing including 
fold down. some internal mixing capability, analog 
digital insert points and a remóte control with a tiny 
acoustic footprint. Some mastering studios use 
digital mixing consoles for mastering. The DAW 
also contains some routing and can be used as part 
of the consolé concept. 



Mcstering Studio biock 
diagram with wire numbers in 
parentheses. 


3y Connecting It All 

Together 











































































































































Tools thot we’re missing: Customized 
and special-purpose gear 

One tool that I am missing is a more ergonomic 
method of routing. Instead of a crossword-puzzle 
routing matrix, I’d like to see specialized software to 
control routers that illustrates the audio chain the 
way we think, from source to output in a straight- 
forward linear fashion. A company called 
Crookwood has created modular control Systems 
for this purpose. 


1 Pold down is the ability to take a multichanncl or stereo source and monitor a 
reduction to 2 channels or one (mono). We use this to help confirm compati- 
bility of a 5.1 recordingto stereo and/orstereo to mono. 

2 See Appendix 10 for recommended reading. 


Chapter 3 4,0 



CHaPTer 3 1. Introduction 


An 

Earientation 

Session 


Ear training is really mind training, because the 
appreciation of sound is a learned experience. 
Stereo imaging is an illusion that some people still 
don’i get! The first listeners to Edison’s acoustic 
phonograph felt that its reproduction was indistin- 
guishable from real life. It is only with each advance 
in sound reproduction that most people become 
aware of the shortfalls of the previous technology. 
For example, whenever I work at a veiy high sample 
rate, and then return to the "standard” (44.1 kHz) 
versión, the lower rate sounds mueh worse, 
although after a brief settling-in period, it doesn’t 
sound that had after all. [See Chapter 18] 


As we become more sophisticated in our 
approach to listening, we develop a greater 
awareness of the subtleties of sonic and musical 
reproduction. We can also grow to like a particular 
sound, and each of us has slightly different 
preferences, which vary over the years. When I was 
mueh younger, I liked a little brighter sound, but 
fromabout the age of 20,1’ve tended to prefer a 
well-balanced sound and immediately recognize 
when any area of the spectrum is weak or over 
present. It’s also important to recognize that a 
frequency emphasis that’s too strong for one 
musical genre or song may be just right for another, 
as we explain in Chapter 8. 


A mastering engineer requires the same ear 
training as a recordingand mixing engineer, except 
that the mastering engineer becomes expert in the 
techniques for improving completed mixes, while 
the mixing engineer specializes in methods for 


improving the mix by altering the sound of 
individual instruments within it. As we move into 
the era oí mastering from stems (sub mixes, or 
splits of a larger mix, e.g., vocals, bass, rhythm), 
there will be more overlap between mixing and 
mastering, since the 
mastering engineer 
will also then have 
some control over 
individual 
instruments or 
groups. 

Ear training can either be a passive or a liands- 
on activity. Passive ear training goes on all the time 
("what a linny speaker in that P.A. system”), while 
active ear-training occurs while your hands are on 
the Controls. Make passive ear training a lifelong 
activity—exercisingyour ear/brain connection 
regularly will increase your ability to discrimínate 
fine sonic differences. Practice being consciously 
aware of the sounds around you and identifying 
their characteristics. Acousticians can't help 
judgingthe reverberation time of every hall they 
enter. Too much ear-training practice can ruin the 
enjoyment of a musical program or a good 
relationship. so rule number one is not to tellyour 
spouse every time you notice the surrounds in the 
movie theatre are set too high or the left tweeter is 
blown! However, when the program material is 
sufficiently boring, work on ear-training. For me 
it’s a curse that hits subconsciously at the strangest 
moments ("what a boxy-sounding reverb chamber 
they’reusing"). 


Hands-on ear training is the process of 
learning how to manipúlate the Controls of an audio 
system to arrive at the sound you have inyour 
head; this is also known as developing hand to ear 
coordination. With practice, you can learn to get 

there quickly and 
efficiently. Before you 
work on a piece of 
inusic, try to visualize 
(audiolize?) the sound 
you are looking for; you 
should have a definite 
sonic goal in mind. I 
received a mix from a musician who is a fine jazz 
bass player. It was obvious to me that he had not 
listened to the mix over a variety of playback 
systems, for the bass sounded muddy, indistinct, 
and uneven, the last thing a bass player would want 
to hcar, and the instrument was also much too loud. 
Fortunately the bass player agreed with me on all my 
judgments. I diagnosed this as a case oismall- 
speaker near-field-itis and it wasn’t long before I 
found the cure with equalization and dual-band 
upward expansión (explained in Chapter n). 
Sometimes we don’t know how we’re going to solve a 
problem. hut having a clear goal keeps us from 
fumbling. 

Speaking the Language 

The elassie chart folded into the front cover was 
hand-drawnin 1941 byE.J. Quinby ofroom8oi 
within the depths of Carnegie Hall." We Ve 
reproduced it for the benefit of musicians who want 
to know the frequeney Language of the engineer, and 

* l‘ve nevervisited that room, but it would be an interesting archcological voyage 
to find out who E.J. Quinby was. 


'Make passive ear training 
a lifelong activity. ” 


Chapter 3 42 




for engineers who want to speak in a musical 
language. Sometimes we ‘11 say to a client. "Ira 
boostingthe frequencies around middle C,” instead 
of "...around 350 Hz". Learn a few of the key 
equivalents, e.g., 262 Hz represents middle C, 440 
isAabove middle C, and then remember that an 
octavéis a 2Xori/2Xrelationship. Forexample, 220 
Hz is the frequency of A below middle C in the 
equal-tempered scale. The ranges of the various 
musical instruments will also clueyou to the charac- 
teristics of sound equalization—next time you boost 
at around 225 Hz, think of the low end of the English 
hornor viola. 

Although it helps an engineer to have played 
an mstrament and to be able to read music, many 
successful engineers can do neither. Nonetheless. 
they are not handicapped because they have good 
pitch perception, can count beats and understand 
the musical structure (verse, choras, bridge...) 
veiywell. 

This next chart is a graphic 
representation of the subjective 
terms we use to describe excesses 
ordeficiencies of various 
frequency ranges. 

Excess of energy is shown 
above the bar and a déficit below. 

The bar is also divided into eight 
approximate regions. There are no 
standard terms for these 
divisions: what sume people cali 
the upper bass, others cali the 
lower midrange; some cali the upper midrange 

• JunJohnston (in corrcspondencc) points out th.il ptraks change the panial 
Imdness of a signal more than d:ps. It's all psychoacoustics! 


what others cali lower treble. Notice that we have 
far more descriptive terms for areas that are boosted 
as opposed to those which are recessed. This is 
because the ear focuses much more on boosts or 
resonances than on dips or absences.* 

A few subjective examples 

With an equalizer, the sound can be made 
varmer in two ways: by boostingthe range roughly 
between 200 and 600 Hz; or by dippingthe range 
roughly between 3 and 7 kHz. These two ranges form 
ayin and yang, which we'll discuss in Chapter 8. 
Anotherway to make sound warmer (or its converse, 
edgier) is to add selective harmonics. as described 
in Chapter 16. Too much energy, and/or distortion. 
in the 4 to 7 kHz región can be judged as edgy, 
especially with high brass instruments. Equalizing 
in this región can exaggerate or de-emphasize the 
harmonio distortion of a preamplifier or converter. 
The term presence is associated with any sound that 
is strongand clear, which often means a strong 
upper midrange, but too much presence can be 


- MUOOY 

- THICK 

PUNCH Y DAOS 

IMPACT - 

SLAM - FULL 

EXTENDED B0T70M 

SOUD1TY - 

BOOMY - BC 

FAT 

BASS - SWEET 

|lllll | II I I I | I I I I I | I I I I 

52 125 250 500 

Low Uppwr 
BASS BASS 


PRESENCE - 
coov — 


NASAL 


wmmm 


1 1 11 


1 T 

4k 


■MU 


BRIGHT 

«I 


1 ri 1 1 1 


EXTENDED TOP 
AIR{V) — 

■ 


— MIOAANOE - 


18 k 

- E>»«w - 

Trabta 


Subjective Terms we use to 
describe Excess or Deficiency of 
the various Frequency Ranges. 


4.3 Earientation 



























fatiguing or harsh. If the sound is edgy, it can often 
be made sweet(er) by reducing energy in the 2.5 to 8 
kHz range. Too much energy in the 3 oo- 8 oo range 
gives a hoxy sound; go up another third octave and 
that excess is often termed nasal. A deficiency in the 
range from roughly 75 to 600 Hz creates a thin sound. 

EarTraining Exercise *1: 

Learn to Recognize the Frequency Ranges 

Learningto recognize frequencies is an exercise 
in the perfection of pitch perception. To ha veperfect 
pitch means that you can identify each note 
blindfolded. At concerts it’s a neat trick if you can 
identify the frequency of feedback before the mix 
engineer. But this ability is not just a trick: if you 
learn to identify the ranges by ear, this will greatly 
speed up your performance at the equalizer's 
Controls. There was a time when I practiced until 1 
could automatically identify each i /3 octave range 
blindfolded, but now my absolute pitch perception 
is between i /3 and 1/2 octave, which is about what 
you need to be a fast and efficient equalizer. Start 
ear training with pink noise and then move to 
music, boosting each range of a i /3 octave graphic 
equalizer until you can recognize the approximate 
range. Get a friend to boostthe EQ faders randomly 
and give a blindfold test. Don’t be dismayed if 
you’re only accurate to about an octave. This will get 
you cióse enough to the range of interest to be able 
to "focus” the equalizer the rest of the way. 

EarTraining Exercise *2: 

Learn the Effects of Bandwidth limiting 

Less-expensive loudspeakers usually have a 
narrower bandwidth, as do lower-quality media and 
low sample rates (e.g. the 22.05 kHz SR audio files 


often used in computers). Train your ears to 
recognize when a program is naturally extended, 
and when it has been bandwidth-limited. It’s 
surprising to discover how much high end filtering 
you can get away with, as can be heard when oíd 
films with optical sound tracks are shown on TV. 
Most musical information is safely tucked away in 
the midrange. the only frequencies that remain in 
an analog telephone connection. My careerin 
televisión began when telephone landlines were still 
the primary means of network transmission, and I 
soon leamed that a 5 kHz bandwidth takes away the 
life and clarity of the sound, even if all the informa- 
tional content is there. Those were not pleasant days 
before satellite transmission and ISDN openedup 
network televisión sound to high fidelity. Practice 
learningto identify these effects usinghigh and low 
pass filters on various musical examples. As for the 
bottom end, the human ear tends to supply the 
missing fundamentáis. This can be observed when 
watching an oíd TV show that’s been dubbed and 
filtered too many times; you may not notice the 
voice isveiy thin-soundinguntil it’s beenpointed 
out to you. Another way to study the contribution of 
the low bass range is to turn your subwoofers on and 
off, or listen to historie acoustic recordings. 

EarTraining Exercise *3: 

Learn to Identify Comb Filtering 

About the only advantage of the English system of 
measurement is that the speed of sound is a nice 
round number, about 1000 feet per second, or even 
more approximately, one foot per millisecond. 
When a single sound source is picked up by two 
spaced microphones, and those microphones are 


Chapter 3 44 



Severe Comb Filtering 

combined into a single channel, audible comb 
filtering will result if 

• the gain of each microphone is about the same and 
the microphones are identical or similar models. 
When one mike’s gain is reduced at least 10 dB, the 
comb filtering becomes audibly insignificant. 

• the relative mike distanee from the source is in 
the critical area from about 1/2, foot ('•150 mm) 
through about 5 feet (-1.5 M). At 5 feet, the 
attenuation of the more distant mike’s signal also 
reduces the combing effect. 

Comb - filtering can occur anytime a source and 
¡tsdelayed replica are mixed to a single channel. The 
above figure shows the frequency response resulting 
vhen the source and the delay are at equal gain. The 
vertical divisions are 3 dB. From top to bottom—a 
delay of 3 ms (approximately equivalen! to a 3 


feet/xM path difference), 1 ms, and 2 ms. In real 
life, the reflection (delay) will be diffused and 
somewhat attenuated, so the comb-filtering effect 
will be less severe. 

It’s amazing how many engineers think they can 
fix the reflections from a singer’s music stand by 
addinga piece of carpet. But carpet has no 
meaningful effect in the range below about 5 kHz, 
and as you can see from the figure, that’s where the 
major problems are. Another example of comb 
filtering is when the sound from an instrument 
reaches the microphone both directly and also via 
reflections from the floor. Nearfield monitoring is 
inherently inaccurate because the sound from the 
speakers reaches the ear directly and also via a 
bounce off the consolé top, yielding very uneven 
frequency response. 


4,5 Earientation 





























































































































































































"Didyou know that wearing a 
hat with a brim puts a notch in 
your hearing at around i kHz?" 

mutílate sound, since the proper operation of a 
lavalier microphone depends on indirect sound, 
including refiections from nearby surfaces. Listen 
to the weather report blindfolded and create a play- 
by- play based on your ear’s perception of where the 
weatherperson must be¡ "Now she’s crossed her 
hands on her chest, about 3 ” below the lavalier 
microphone. Now she’s turned around to face the 
blue screen, about a feet away. Now she’s uncrossed 
her hands and is walking away from the screen. 

She’s sitting down at the anchor desk for the 
discussion and you can hear from the hollow dip at 
500 Hz that her mike is about a foot above the desk. 
Uh-oh, the mix engineer has opened a second 
microphone and the anchorman's voice is leaking 
into her mike from a couple of feet away.”' The ear 
really begins to notice comb filtering whenthe delay 
is changing, for example, the classic flanging effect 
when an artist sways to and fro in front of a 
reílecting music stand. That’s why the best music 
stand is none at all; open-wire stands are second- 
best and careful placement does the rest. 


Televisión and 
film soundtracks 
provide excellent 
laboratory exer- 
cises in learning 
howcomb- 






What does comb-filtering have to do with audio 
mastering? The answer is that learning to identify 
its effects is an excellent earientation exercise. The 
figure shows that comb filtering is extremely 
difficult to remove with an equalizer. And a 


corrective equalizer would be especially problematic 
in mastering since the equalization affects the 
entire mix, not just the instrument that needs 
fixing. Ideally comb-filtering should be prevented 
bel’ore the mix gels lo masteringby using acoustic 
know-how. Unfortunately, comb-filteringproblems 
are more common thanyou’d believe. By the way, 
did you know that wearing a hat with a brim puts a 
notch in your hearing at around 2, kHz? Comb- 
filtering is all around us. To hear comb-filtering 
right now, talk into your cupped hands, then take 
them away while still talking. Learn to recognize the 
effect blindfolded. Or walk into an announce booth 
with your eyes closed, talk into the window and see 
how cióse you have to get to it before you notice the 
coloration. 

Ear Training Exercise **4: The Sound of Great 
Recordings well-reproduced; Perception of 
Dynamics, Space and Depth 

Many mastering engineers are privileged to 
work on a wide variety of music throughout the 
week-, there’s never a dull moment. Train your ears 
to recognize good recorded sound in eachgenre. 
Start by becoming familiar with the sound of great 
recordings made with purist mike techniques, little 
or no equalization or compression. Learn what wide 
dynamic range and clcar transients sound like 
captured and reproduced, which will helpyou 
recognize limited dynamic range material when it is 
played. The percussive impact of real life is the 
standard that can never be bettered. It’s an exhila- 
rating, incomparable live experience to stand 
directly in front of a live bigband. Next, compare 
the depth which can be captured with simple miking 


Chapter 3 46 


techniques and which is lost when múltiple miking 
isused. 

EarTraining Exercise *5: The Proximity Effect Game 

Take the npportunity to experience and 
reference the sound of live, unamplified music. I’ll 
neverforget the wonderful artist who broke into 
song in my mastering room. There’s no greater 
privilege than to receive a prívate, live unamplified 
coneert given just foryou by a world-class vocalist. 
Seek out those rare opportunities. Listen to your 
singer rehearsing without a microphone; check out 
the natural tonality, clarity and incredible dynamics 
of a voice that’s singing and projecting. 

Now compare that natural sound with 
engineers' use of proximity effect, which is the 
increase in bass response when a directional 
microphone is moved closer to the source. Most 
recorded pop vocals have greater lower midrange 
and presence than real life. The trick is to use just 
enough to make it sound "super-natural” but not 
muddy, thick. sibilant, bright or edgy. 

EarTraining Exercise *6: The Sound of Overload 

Many amplifiers have their own unique sound, 
probably attributable to subtle differences in 
harmonio structure. When solid-State amplifiers 
are driven into overload, they clip, the round part of 
their output waveform starts to square off. Clipping 
isa form of severe overload; some amps (particularly 
tube amps) overload gracefully. and can be used as a 
form of compressor, making sounds fatter when you 
push them past their linear región. Others clip 
drastically. producinglots of high, odd harmonic 
distortion. Learn to identify the sound of overload 
inall its forms; analogtape reaching saturation. 


analog tape in severe saturation, overdriven power 
amplifiers producing intermodulation distortion, 
optical film distortion (as in classic 1930’s talkies), 
and so on. As a first training exercise, study the 
saturation on peaks of a classical or pop recording 
made from analog tape versus a modern all- digital 
recording. You may prefer one type of overload to 
the other. As a benefit of this ear-training, you will 
begin to learn the characteristics of each piece of 
gear you encounter; become a master of the gear 
instead of it mastering you. Soonyou’ll discover 
some rare digital gear that overloads more gently 
than others. 

EarTraining Exercise *7: Identify the Sound Quality 
of Different Reverb Chambers 

Artificial reverb chambers have progressed 
tremendously over the years. Become familiar with 
the artifaets of different models of reverbs. Some 
models exhibit extreme flutterecho , some sound veiy 
fíat, while others produce an excellent simulation of 
depth. We’ll learn a bit how they accomplish this in 
Chapter 17. 

Non-Exercise: Recognize Bad Edits, Wow and Flutter, 
Polarity Problems 

Bad Edits: I’m so paranoid I sometimes think 
I can hear edits at concerts! But seriously, an 
experienced mastering engineer should be able to 
recognize a bad edit in a tape, where the ambience 
or the sound is partially cut off, or the sound 
partially drops out. I don’t have any specific 
exercises to recommend except to apprentice/ 
praetice with an experienced editing engineer who 
will listen to your edits and point out their faults. 


47 Earientation 


Wow and Flutter: Wow and ilutter are caused by 
speed variations in recordings. and are no longer a 
problera with digital recording. But mastering 
engineers are often called upon to restore older 
analog recordings. So to enhance your pereeptual 
acuity, make a cassette recording of a solo piano, 
and compare it side by side with a digital recording 
of the same instrument. 

Polarityproblems: Learn to recognize when the 
left channel of a recording is out of polarity with the 
right. Reverse the polarity of the wires to one 
loudspeaker and become familiar with the sound of 
the error, which is characterised by thin sound and a 
hole in the middle of the image. This will also help 
you to recognize when some instruments in a mix 
are out and others are in polarity. 

In Summary 

Earientation should be a lifelong activitv and no 
one can become an expert in one fell swoop. These 
exercises will help get you up to speed. 


i Thcrc is a specialized televisión engineer’s mixing technique to ieal with mike 
leakageto avoid acoustic phase cancellation (comb filtering). Most womcn’s 
voices require a bit more gain, so for this discussion wc made the weatherperson 
a woman and the anchorperson a man. Ride the level of one mike only, drop it 
ahnut 5 dB when the person is not talking: this shnulH he the mike requiring the 
most gain (the quietest talker)—because her voice will hardly leak into the 
anchonnan's mike. but his will leak into hers. Watch her lips closely so as not to 
up cut kerwords. 


Chapter 3 4. S 



CHaPTer 4 I. Introduction 

This chapter is about (pick onc): 

§ a) the smallest, most subtle, insignificant 
problem in digital audio 
1 I \ • A ~l b) the biggest, most important problem in 

cUlQ L/llilCr digital audio 

ífyou picked both a) and b), then you are 
correct. Audio engineers must learn how to deal 
with and take advantage of wordlengths and proper 
dithering, but we must also keep our problems in 
perspective. If eveiything else in a project is right, 
then proper dithering is veiy important. But if the 
mix isn’tgood, or the music isn’t swinging, then 
dither probably doesn’t matter very much. If we 
want to get eveiything right, and maintain the 
sound quality of the audio, we need to pay particular 
attentionto the topics of this chapter. 

II. Dither in the Analog Domain 

Inananalogsystem, the signal is continuous, but 
in a PCM digital system, the amplitude of the signal 
out of the digital system is limited to one of a set of 
fixed valúes or numbers. This process is called 
quant ization. Each coded valué is a discrete step. 
For example, there are exactly 65,536 discrete steps, 
or valúes available in 16-bit audio, and 16,777,216 
discrete steps available in 24 bit audio. To calcúlate 
the approximate codable range of any PCM system, 
multiply the wordlength by 6; e.g. multiply 8 by 6 to 
get 48 dbfor an 8-bit system. So the lowest valué 
that can be encoded in 16-bit is 96 dB down from 
the top¡ in 24-bit it’s 144 dB. In a moment we will 
introduce the concept of dither, but if a signal is 
quantized without using dither, there will be 



49 


quantization distortion related to the original 
input signal. This can introduce harmonics, subhar- 
monics, aliased harmonics, intermodulation, or any 
of a set of highly undesirable kinds of distortion. In 
order to prevent this, the signal is dithered , a process 
that mathematically removes the harmonics or 
other highly undesirable distortions entirely, and 
that replaces it with a constant, fixed noise level. 

Here’s a simple thought experiment that explains 
why dither is necessaiy and how it works.” Let’s 
create a basic A/D converter. We’ll make it sensitive 
to DC, and bipolar, so it responds to both positive 
and negative analog inputs, and we’ll give it a very 
big LSB threshold of 
i volt to make the numbers easy. 
We’ll construct our ADC so that 
an analog source over the range 
between - .5 volts and +.5 volts 
produces a digital output word of 
o, and an analog source over the 
range between +.5 volts and 1.5 
volts produces an output of 1, 
and so on. If, without applying 
any dither, we present a 0.25 volt 
DC (continuous) signal to the 
input of the ADC, the output of 
the ADC will be a string of zeros. 
In fact, any signal between -0.5 
and 0.5 volt will result in an ADC output of zero. Any 
information below the LSB threshold is completely 
lost, as illustrated above. 

Remove the 0.25 volt signal and apply dither to 
the input of the ADC in the form of a completely 



Graph of a hypothetical IDC whose LSB threshold is I volt (* or - 
0.5 volts). Cach sampled analog Input is represented by a small 
orange square; in this example, the analog source is held at a 
continuous 0.25 volt Note that any input between -.5 volt and 
*.5 volt will be lost, because it is below the threshold ofthe LSB, 
producing a string ofzercs. Because it is below threshold, a DC 
signal held continuously it 0.25 volts will not be detected. 


random signal (i.e., noise) centered around o volts. 
Its peak amplitude randomly toggles the LSB of the 
ADC. The output of the ADC will be a stream of very 
small random valúes. However, the average of all 
these valúes will be zero. 


Now let’s apply our 0.25 volt signal again (with 
the dither on). The two analog voltages sum 
together, the dither and our signal. At each sample 



Random dither applied to the ADC whose highest peak-to-peik valué is 
slightly greater than the LSB and whose average valué is zero volts. 


point (in time), the 0.25 valué of our analog 
source is added to the random dither valué. The 

output stream will again look like a stream of veiy 
small random numbers, but guess what? The 
AVERAGE of all those numbers will now be.. .you 
guessed it, 0.25. We have thus retained the 
information that was previously lost (even though 
it’s buried in "noise”). In other words, our 
resolution has improved. The conversión is still 
essentially random, but the presence of the 0.25 volt 
signal biases the randomness. Put another way, the 
characterization of the system with dither on is 
transformed from completely deterministic to one 


• Courtesy of Mithat Konar, director of engineering. biró technology. Also, manv 
thanks to Jim Johnston for helptr.g me to diagram this vtsually. 


Chapter 4 50 















ofstatistical probability. The periodic alternation of 
the LSB between the States of o and i results in 
encoding a source valué that is smaller than the 
LSB. In other words, on the average, the LSB puts 
out a few more ones than zeros because of our +0.25 
volt signal. We say that dither exercises or toggles or 
modulates the LSB.' 

With the dither on, we can now change the input 
signal over a continuous range and the average of 
the ADC output will track it perfectly. An input 
signal of 0.373476 volts will have an average ADC 
output of (the binaiy equivalent oD 0.873476. The 
same will hold trae of inputs going over the LSB 
threshold: an input of 3.22278 will have an average 
ADC output of 3.22278. So not only has the dither 
enhanced the resolution of the system to many 
decimal places, but it has also eliminated 
"stepping," quantization effects! 

Dither actually extends the resolution of a 
digital system, and in addition to being able to 
record and reproduce all the analogvalues at high 
and médium levels, dither lets us encode low level 
signáis belowthe -96 dB limltl* These results - 
resolution enhancement and the elimination of 
quantization distortion - cannot be achieved by 
addingnoise afterthe A/D conversión. So dither 
must be added at the proper point in the circuit and 
addingnoise is not the same as dithering. 

Dither's resolution enhancement is truly 
physical/mathematical in nature, not merely a trick 
which fools the ear. Dither is not simply a means ”to 
mask the low level digital breakup.” The psychoa- 
coustic explanation is that it is because human 


beings are able to hear signáis in the presence of 
noise of greater energy than the signal, i.e., with 
negative signal-to-noise ratios. In practice, we can 
hear signáis about as far as 15 to 20 dB below the 
LSB, so a properly-dithered 16-bit recording can 
have a perceived dynamic range about as great as 
115 dR. Rut its signal to noise (signal to dither) ratio 
will only measure about 91 dB, since the addition of 
the dither raises the noise floor about 5 dB. 3 
Regardless, we can hear signáis below the noise, 
which explains why the perceived dynamic range of 
the dithered system is greater than its codability. 

Every well-made 16-bit A/D incorporates 
dither to linearize the signal. If you were lucky 
enoughto have a 20-bit or 24-bit A/D and 24-bit 
storage to begin with, then dither is probably not 
necessary duringthe original analog encoding. 
Although the inherent thermal noise on their inputs 
is not shaped to perfectly dither the source, current 
20-bit A/Ds self-dither to some degree around the 
18-19 bit level because of this basic physical 
limitation. Similarly, a transfer from typical analog 
tape probably has enough hiss to self-dither any 
transfer to 16-bits, as long as there is no digital 
Processing before storage. But I believe there is a 
slight advantage to encoding any transfer at 20 bits 
or above because the ear can hear signal below the 
noise; it certainly doesn’t hurt to encode at 24 bits, 
except for taking up more storage space. 

The dynamic range of an A/D converter at any 
frequency can be measured without an FFT analyzer. 
All that you need is an accurate test tone generator 
and a low-noise headphone amplifierwith 



MYTH: 

Adding noise 
is the same as 
dithering. 

I_I 


5 ' 


Wordlengtlis & Dither 




sufficient gain. To conduct the test simply listen to 
the analog output and see when it disappears (use a 
real good D/A for this test). Another important test 
is to attenuate music in a workstation (about 40 dB) 
and listen to the output of the system with 
headphones. Listen for ambience and 
reverberation; a good system will still reveal 
ambience, even at that low level. Also listen to the 
character of the noise—it’s a very educational 
experience. 

III. The Need for (re)Dither ¡n the 
Digital Domain 

The First Secret of Digital Audio: How Wordlengths 
Expand 

Even once the signal has been turned into 
numbers, under many circumstances we are still not 
exempt from the need for further dithering. 
Unfortunately, many processor and DAW manufac- 
turers still have not recognized this fact, 4 and this 
partly explains why some digital devices sound puré 
and sweet, while others are coid and harsh. The 
reason: as soon as you transform audio by changing 
its level, equalizing, compressing, or nearly any 
other sort of calcidation, you have also increased its 
wordlength! Which means that the sound quality of 
your music will be deteriorated if you simply 
trúncate the output to 16 bits or any shorter 
wordlength. Let’s see how that happens, and hcw we 
can prevent the problem. 

Here’s a simplified lesson in DSP (Digital 
Signal Processors). Digital audio is all arithmetic, 
but the accuracy of that arithmetic, and how the 
engineer (or the workstation) deal with the 


arithmetic product, can make all the difference 
between pure-sounding digital audio or digital 
sandpaper. All DSPs deal with digital audio on a 
sample by sample basis. At 44.1 kHz, there are 
44,100 samples in a second (88,200 síereo 
samples). When changing gain, the DSP looks at the 
first sample, performs a multiplication, spits' out-a 
new number, and then moves on to the next sample. 
It’s that simple. 

To avoid unnecessarily complicated esotérica 
like ?’s complement notation, fixed versus floating 
point, and other digital details, I’m goingto invent 
the term digital dollars. Suppose that the valué of 
your first digital audio sample is expressed in 
dollars instead of volts, for example, a dollar 51 
cents—$1.51. And suppose you want to take it down 
(attenuate it) by 6 dB. If you do this wrong, you'll 
lose more than monéy, by the way. 6 dB is half the 
original valué. 5 So, to attenuate our $1.51 sample, we 
divide itby 2. 

Oops! $1.51 divided by 2 equals 75-1/2 cents, or 
$0.755. So, we’ve just gained an extra decimal place. 
What should we do with it? It tums out that dealing 
with extra places is what good digital audio is all 
about. If we just drop the extra five, we Ve theoret- 
ically only lost half a penny—but back in the audio 
world that 'half a penny’ contains a great deal of the 
natural ambience, reverberation, decay, warmth, 
and stereo separalion that was present in the 
original $1.51 sample! Lose the half penny, and 
there goes your sound. The dilemma of digital audio 
is that most calculations result in a longer 
wordlength than you started with. Getting more 


Chapter 4 52 



decimal places in our digital dollars is analogous to 
havingmore bits in our digital words. When a 
multiplication or división is performed, the 
wordlength can increase infinitely, depending on 
the precisión we use in the calculation. A i dB gain 
boost involves multiplying by 1.123018454 (to 9 
place accuracy). Multiply $1.51 by 1.122018454, and 
you get $1.694247866 (try it onyour calculator). 

Each individual decimal place may seem 
insignificant, but DSPs require repeated precisión 
calculations to perform filtering, equalization, and 
compression and the end number may not resemble 
ihe right product at all, unless adequateprecisión is 
maintained. Remember. the more precisión, the 
cleaneryour digital audio will sound in the end (up 
to a reasonable limit). 

So this is the ñrst critical secret of digital audio: 
u’ord lengths expand. But if this concept is so simple, 
w hy is it ignored by too many manufacturers? The 
answeris simply cost. While DSPs are capable of 
performing double and triple precisión arithmetic 
(all you have to do is store intermedíate producís in 
temporary storage registers), it slows them down, 
and complicates the whole process. It’s a hard 
choice. entirely up to the DSP programmer/ 
processor designen, who has probably been put 
under the gun by management to fit more program 
features into less space, for less money. Questions 
of sound quality and quantization distortion can 
become moot compared to the sellingprice. In 
Chapter 16 we’ll try to learnwhether processors 
which measure better also sound better. It's a safe 
bct to say that high horsepower is both costly and 
bettcr-sounding. 


Inside a digital mixing consolé (or 
workstation), the mix bus must be much longer than 
16 bits, because addingtwo (or more) 16-bit 
samples together and multiplying by a coefficient 
(the level of the master fader is one such 
coefficient) can result in a 32 -bit (or larger) sample, 
with every little bit significant. 6 Since the AES/EBU 
standard can carry up to 34-bits, it is practical to 
take the internal long word, bring it down to 24 bits, 
then send the result to the outside world, which 
could be a 24-bit storage device (or another 
processor). The next processor in line may have an 
internal wordlength of 48 or more bits, but before 
output it too must reduce the precisión back to 24 
bits. The result is a slowly cumulating error in the 
least significant bit(s) from process to process. 
Fortunately, the least significant bit of a 24-bit word 
is 144 dB down, and most sane people recognize that 
degree of error to be inaudible, 7 but only as long as 
the processors reduce their respective long word 
lengths properly to 24 bits on the way out. 

Something For Nothing? 

But suppose we want to record the digital 
consoie's output to a GD Recorder, which only stores 
16 bits. Frankly, it’s a meaningful compromise to 
take a consoie’s 24-bit output word and trúncate it 
to 16 bits. Even if the source (multitrack) is 16-bit, 
there is an advantage to usingthe 24-bit output of 
the consolé or DAW. Similarly, there’s only one 
right way to use a digital compressor or equalizer or 
reverb or other processor: Record its 24-bit output 
onto a 24-bit médium. And processors or consoles 
that purportedly produce a 16-bit output from a 16- 
bit input are throwing away bits! The same is true 


53 Wordlengths & Dither 


for those inexpensive programs built into 
computers which take in audio CDs and al low you to 
manipúlate the sound and write a new CD. Gritical 
listeners immediately realize you don’t get 
something for nothing. Greater resolution and 
better audio quality can be achieved by mixing with 
an analog consolé to a 3 o IPS, 1/2” analogtape than 
by passing the signal through a digital consolé that 
truncates its internal wordlength to 16 bits. If the 
consolé dithers its output to 16 bits instead of 
truncating (check with the manufacturer), the 
situation is a little better but even ditheringhas its 
compromises, too, as we shall see. 

How Dither Works in the Digital Domain 

Since truncation 8 is so bad, what about 
rounding? In our digital dollar example, we ended 
up with an extra 1/2 cent. In grammar school, they 
taught us to round the numbers up or down 
according to a rule (we learned "even 
numbers...round up, odd. ..round down”). But 
rounding produces little better results than 
truncation, perhaps adding half a bit additional 
precisión, but with lots of correlated quantization 
distortion. So, when we’re dealing with more 
numerical precisión and small numbers that are 
significant, we still havc to use dither noisc to bring 
the information from the LSBs into the bits we 
intend to use. 

The logic is the same as we described in the 
analog domain, except the processor must generate 
the dither digitally. as a series of random numbers. 
simulating the randomness of analog dither. This is 
often called redithering , because the signal may have 


Chapter 4 54 


been already dithered duringthe encoding 
(recording) process. But the advantage of the 
original dither becomes moot once we have 
reprocessed the audio, and we must dither all over 
again to preserve resolution before truncation. In 
the analog example, we learned that the encoded 
signal plus dither noise contains all the low level 
information below the LSB, because we added the 
analog dither to the low level analog signal. 

Similarly, in the digital domain, we can add two 
digital numbers together, one of which is a random 
number, representing random noise. 

To do this, we calcúlate random numbers and 
add a different random number to every sample. 
Then, cut it off at 16 bits (or whatever shorter 
wordlength we desire). The random numbers must 
also be different for left and right samples, or else 
stereo separation will be compromised. 

For example: 

Starting with a 24-bit word (eaeh bit is either a 1 or a 
o in binaiy notation): 

The result of the additionof the Z’s with the Y’s gets 
carried over into the newleast significant hit of the 

-Upper 16 bits--Lower 8' 

Original 24-bit MXXX XXXX XXXX XXXW YYYY YYY' 
Add random number ZZZZ ZZZ! 

16-bit word (LSB, letterW above), and possibly 
higher bits if you have to cariy. Just as in the analog 
example, the random number sequence combines 
with the original lower bit information, moáulating 
the LSB. The result is that much of the sound quality 
of the longword is carried up into the shorter word. 


Raudo m numbers such as these transíale to random 
toise (hiss) when converted to analog and this hiss 
is audible if listeningcarefully with headphones. 

Some Tests forLinearity 

Wliether a digital audio workstation truncates 
digital words or does other nasty things, can be 
verified without any measurement instruments 
except your ears. Track 42 of Best ofChesky Classics 
and Jazz and Audiophile Test Disc, Vol. IW is a fade to 
noise without dither, demonstrating quantization 
distortion and loss of resolution. Track 43 is a fade 
te noise with white noise dither, and track 44 uses 
noise-shaped dither (to be explained). UsingTrack 
43 as the test source; it is possible to hear sraooth 
and distortion-free signal down to about -115 dB. 
Track 44 shows how much better it can sound. lf we 
then process track 43 with digital equalization or 
level changes (both gain and attenuation, with and 
without dither) we can hear what they do to the 
sound. II'the workstation is not up to par, the result 
can be quite shocking. Alternatively we can send the 
output of the test from the workstation to a CD 
recorder, load the CD back in, and raise the gain of 
[he result 24 to 40 dB to help reveal the low level 
problems. The quantization distortion of the 40 dB 
boost will not mask the problems you are tiying to 
litar, although it’s thcorctically better if dither can 
be added for the big boost. 

So Little Noise—So Much Effect 

91 dB seems like so little noise. But strangely, 
astute listeners have been able to hear the effect of 
the dither noise, even at normal listening levels. 
Dither noise helps us recover ambience. but 
conversely it also obscures the same ambience we’ve 


been tryingto recover! Dither at the 16 bit level adds 
a slight veil to the sound. That’s why I say, dither, you 
can t Uve with it. and you can ’t Uve without it. 

Improved DitheringTechniques 

However, where there’s a will, there’s a way. 
Although the required amplitude of 16-bit dither is 
about -91 dB, it’s possible to shape (equalize) the 
dither to minimize its audibility. Noise-shaping 
techniques re-equalize the spectrum of the dither 
while retainingits average power, effeclively rnoving 
the noise away from the areas where the ear is most 
sensitive (circa 3 KHz), and into the high frequeney 
región (10-22 KHz). 

On the next page is a graph of the amplitude 
versus frequeney of one of the most successful 
noise-shaping curves, POW-R dither, type 3 . 

This is clearly a very high-order filter, 
requiring considerable calculation, with several 
dips where human hearing is most sensitive. It is 
the inverse of the "F” weighting curve, which 
defines the low-level limit of human hearing. The 
sonic result is an incredibly silent background, even 
on a 16-bit CD. Chapter 16 studies these effeets in 
more detail. 

There are numerous noise-shaping redithering 
devices on the market. Very high precisión (56 to 72 
bit) arithmetic is required to calcúlate these random 
numbers with justice. One box uses the resources of 
an entire DSP chip just to calcúlate dither, with 72- 
bit precisión arithmetic. The sonic results of these 
noise-shaping techniques range from very good to 
marvelous. The best techniques are virtually 
inaudible to the ear, all the dither noise has been 


55 Wordlengths & Dither 



pushed into the high frequency región, which at -6o 
or -70 dB is still inaudible. Critical listeners were 
complaining that the high frequency rise of the early 
noise-shaping curves changed the tonality of the 
sound, adding a bit of brightness. But it turns out 
that psychoacoustically, it is the shape of the curve 
in the midband that affects the tonality, due to 
masking. A couple of the latest and best of these 
noise-shaping dithers are virtually tonally neutral, 
to my ears. Ittooka longtimeto get there (about 10 
years of development), but I feel that the best of 
these processorsyield 19-30 bit performance on a 
16-bit CD, with virtually no tonal alteration or loss 
of ambience from the 24-bit source. 

Noise-shapers on the market inelude: Lavry 
Engineeringmodel 3 ooo Digital Optimizer, 
Meridian Model 618, Sony Super Bit Mapping 
(SBM), Waves Li and L? Ultramaximizers, Prism, 
POW-R, and several others. 

Apogee Electronics produced the UV-32 
system, in response to complaints about the sound 


of earlier noise-shaping systems, and declaring that 
16-bit performance is just fine. They do not use the 
word "dither” (because their noise is periodic, they 
prefer to cali it a "signal"), but it smells like dither 
to me. Instead of noise-shaping, UV-22 addsa 
carefully calculated noise at around 22 KHz, without 
altering the noise in the midband. 

Pacific Microsonics has produced the HDCD 
(High Dcflnition Compatible Disc) system, which 
incorporates one of the best A/D converters with an 
encode-decode system. Special codes are buried in 
the i6th bit (LSB) along with standard dither; these 
codes inform HDCD-equipped D/A converters how 
to alter their gain structure so as to produce 20-bit 
or better quality, but only on the proper D/A 
converter. When an HDCD DAC is not used, the 
sound quality is reduced to that of a standard CD. 
However. if the mastering engineer manipulates 
some extra features of the HDCD system, known as 
peak extensión and low level. then the music sounds 
compressed on a standard CD player and can only be 
properly reproduced (without compression) on an 


Chapter 4 56 































































































































HDCD player/DAC. Despite its ñame, HDCD, if 
manipulated aggressively, is not compatible with 
regular playback. The sound quality of the Pacific 
A/D is very nice; it’s regretful that the licensc 
requires all CDs made from that converter to be 
HDCD-encoded; so we cannot legally choose to use 
another manufacturer's dither with the Pacific A/D. 

We can effectively compare the sound and 
resolution of these redithering techniques, by 
performing a low level test with music. We simply 
feedlowlevel 24-bit music (around -40 dB) into the 
processor, and listen to the output at high gain in a 
pairof headphones with a good quality 16-bit D/A 
converter, or a higher resolution D/A auditioned 
through a truncation device. 10 The sonic differences 
between the Systems can be shocking: Some will be 
grainy, some noisy. and some distorted, indicating 
improper dithering or poor calculation. Though the 
winner of this test will probably be the best choice of 
dithering processor, also audition the music at 
normal monitor levels, because the psychoacoustic 
effect of the dither will be different and the high 
frequency noise less bothersome. 

The Cost of Cumulative Dithering at 16 bits 

As we have already seen. the measured 
amplitude of 16-bit dither is cxtremely low, approx- 
imately - 91 dBFS. But a skilled listener does not 
have to listen at a veiy high level to hear the 
degradation of improper dithering. When feeding 
processors, DAWs or digital mixers to a shorter 
wordlength médium, dither should always be 
applied lo the output of the processor because 
dithering always sounds better than truncation 
without dither." But since dithering to 16 bits adds a 


slight veil to the sound **— cumulative dithering to 16 
hit, múltiple generations of 16-bit dither should be 
avoided: redithering to 16-bit should be the one- 
tiine, final process in the project. Mix to a long 
wordlength médium and send that file to the 
mastering house, which will apply 16-bit dither 
once, at the tail end of the project. 

The Sound Effects of Defective Digital Processors 

Since digital processors are computers 
programmed by human beings, we have to be sure to 
QuestionAuthority. never taking a digital processor, 
or any DAW or Computer that processes audio, for 
granted. For example, when software is changed or 
updated, we should never assumethat the manufac- 
turers have found all the bugs and we should assume 
that they may have created new ones. We even need 
to ensure that BYPASS mode, which seems 
seductively simple, actually does produce true 
clones in bypass. The illustration on the next page 
(courtesy of JimJohnston) shows a series of FFT 
plots of a sine wave, illustratingthe type of non- 
linear distortion producís produced by truncation 
without dithering. The top row is an undithered 16 
bit sinewave. Note the distortion products (vertical 
spikes at regular intervals, not harmonically related 
to the source wavc). The second row is that sinewave 
with uniform dither. Note how the distortion 
products are now gone. The bottom row is the 
formerly dithered sinewave, going through a 
popular model of digital processor with a defective 
BYPASS switch, and truncated to 16 bits. This is 
what would happen if a (16-bit) CD was fed through 
this processor in so-called BYPASS mode, and 
dubbedtoaCDR! 


57 Wordlcngtlis & Dither 


Sine Wave 
16 bits 
No dither 


Sine Wave 
16 bits 

Uniform dither 

Dithered Sine Multiplied 
by 1-2*-24 

(Bypass mode of a 
popular effects box) 



I 


240 Hz 



i 


I .i .1 1 




1 kHz 



MYTH: 

Expandtng the 
wordlength ofthe 
somples from 16 
to 24 (or 32) makes 
the sound better. 

I_I 




17 kHz 


Chapter 4 5S 


This is why every processor should be tested for 
bit transparency before attempting to make master- 
quality work with those processors patched into the 
signal chain. 

IV. Some Practical Dithering Examples 
and Guidelines 

1) When reducing wordlength you must add 
dither. Example: From a 24-bit processorto a 
16-bit DAT. 

2) Avoid dithering to 16 bits more than once on 
any project. Example: Use 24-bit intermediate 
storage. do not store intermediate products on 
16-bit reeorders. 

3 ) Wordlength increases with almost any DSP 
calculation. Example: The outputs of digital 
consoles, DAWs and processors will be 24-bit 
even if you start with a 16-bit source. 

4) Every "flavor” of dither and noise-shaping type 
sounds different. It is necessaiy to audition any 
"flavor” of dither to determine which is more 
appropriate for a given type of music. The most 
transparent-sounding dither may not be 
appropriate for "grungy” rock. 

5) In any project, sample rate conversión should 
be the next-to-last operation, and dithering to 
the shortest wordlength must be last. Inter¬ 
medíate dithering may occur "behind the 
scenes,” e.g. from 48 to 24 bits prior to feeding 
out of a processor. Truncation (without dith¬ 
ering) to 24 bits sounds far less bothersome to 
the ear*" than truncating to 16 bits. 


• Often, barely audible. 


























6) When bouncing tracks with a digital consolé to a 
digital multitrack, dither the mix bus to the 
wordlength of the multitrack. If the multitrack 
is 16-bit digital, that’s a violation of # 2 above, so 
try toavoid bounces uidess ihe multitrack is 20- 
bit (or better). Example: You have four tracks of 
guitars on tracks 5 through 8, which you want to 
bounce in stereo to tracks 9 and 10. You have a 
20-bit digital multitrack. You must dither the 
consolé outputs 9/10 to 20 bits. If you want to 
insert a processor directly patched to tracks 9 
and 10, don’t dither the consolé, just dither the 
processor to 20 bits. 

One complicatiom The ADAT chips on certain 
consolé interface cards are limited to only 20 bits. 
Corsult your consolé manufacturen If the processor 
basa true 24-bit interface, hut the console's is only 
20 bits, thenyou need to dither the consolé feed to 
the processor to 20 bits and once again dither the 
processor output to 20 bits to feed the multitrack! 

The result will sound slightly warmer, wider, fuller. 

V. Managing Wordlengths 

Many engineers believe that expanding the 
wordlength of the existing samples in a workstation 
improvesthe sound. This is incorreet. The sound 
can never get more resolved than what was 
originally encoded. Regardless of the source 
sample’s wordlength, the workstation will always 
calcúlate to its highest precisión, effectively adding 
zeros to the tail of any shorter words to facilítate the 
calculation (the padded zeros do not change the 
original valué). In other words, 16, 24 and 32-bit 
samples can coexist in a well-designed workstation. 


and when calculations take place, all samples will be 
multiplied to the longer wordlength. Thus, there is 
even an advantage to bouncing a 16-bit session 
down to 24 bits, even though all the sources were 16 
bit. The sonic differenee may be subtle to 
signifieant depending on the quality of the sources. 
At the time of this writing, two workstations (Pro 
Tools and Digital Performer) do notallowusing 
different source wordlengths in the same playlist, 
due to some kind of architecture limitation. This is a 
great inconvenience, time and space-waster, 
because all they do to convert the files is add 
padding zeros. Perhaps because of this 
inconvenience, neither of those workstations is 
commonly used by mastering engineers, who 
regularly mix wordlengths in the same session. 

Auto-Dither 

We often have to combine previously-mastered 
and dithered music with new material. If possible, 
we tiy to avoid cumulative ditheringto 16-bit by 
passingthe already-mastered source unmodified to 
the output médium. There are a couple of ways to 
accomplish this. The first is by using auto-dither by 
source wordlength. The Sonic Solutions 
workstations prior to HD had this useful facility 
built -in¡ in other words. if the source wordlength is 
equal to or shorter than the destination wordlength, 
then the dither generator shuts off automatically. At 
this time, I know of only one model of external 
dither processor that has this facility: the Prism 
AD-2. In the absence of the Prism, or if we prefer 
another type of dither, then we can route the 
already-mastered material to another DAWstream, 
direct to the output and bypassing the dither 



MyTH : 

You can’t mix 
source wordlengths 
in a single 
workstation 
session. 

I_I 


59 


Wordlengths & Dither 




generator. There are other kinds of auto-dither, 
including auto-black which turns off the dither if 
the source audio level goes below a certain threshold 
for a period of time, useful if the producer insists on 
total silence between pieces. 


1 In practice. it’s more than just the LSB which is cxercised. It can be all the 
bits. In base 10, if we add two numbers. and the sum is greater than 9. we 
have to carry. In base 2. we also have to carry and if the next significant digit 
to the left is not a zero. we have to keep on carrying until the next digit up is a 
zero and tura it into a 1. In 2’s complcment, the addition of dither at the LSB 
level will affect the valúes of many digits, including the MSB, as the number 
changes polarity between negative and positive. You can see this on a 
bitscope, which seems to show two valúes at once bccause the numbers are 
always toggling with the addition of dither. 

2 More exactly, below the coding floor of any particular wordlength. In other 
words, if we dither to 20 bits, whose coded range is 120 dB, we can encode low 
level signáis below the -120 dB limit. Or if we dither to 8 bits, we can encode 
low level signáis below 8-bit's normal limit of -48 dBFS. 

3 The noise floor is raised 4.77 dB to be exact. This is the least amount of noise 
necessaiy to properly dither a digital audio signal and elimínate all possible 
distortion. The statistical distribution of the noise must be triangular 
probability. You can read about the math behind this in Lipshitz and 
Vanderkooy’s papers as well as works by Bart Locanthi. 

4 When I wrote an article about dithering around 1993. the situation was much 
worse. Today, only the most stubbora, ignorant, or simply cheap consolé 
manufacturero ignore the need for redithering in their producís. And the 
more aware manufacturero have begun to dither the internal longword (e.g. 

48 bits) up to 24 instead of truncating at the 24A1 bit, which produces an 
extremely subtle sonic improvement. 

5 Por signáis which are correlated, the formula is dB change - 20 * log (ratio). 
For example, if we drop the level by a ratio of 1/2.... whose log is -. 3 oio, then 
multiply by 20, the approximate result is -6 dB <6 dB down). to the nearest 
decibel. Note the use of the word approximate. and yes, the degree of 
accuracy used in such calculations affeets the quality of our audio. 

6 To be exact, the low level (ambience) Information that was present in the 


original wordlength is now spread proportionally over a much longer 
wordlength. 

7 To put it another way. dither noise at -139 dBFS accumulates veiy slowly 
before it could become audible, or interfere with audible ambience. At this 
subtle a level. its about the cumulative effect of múltiple dithero (or lack of 
same) when processes are chained. 1 recommend that all wordlength 
reductions be dithered, even intermedíate reductiorls from 48 to 24, for 
example, because as the material is further processed, previous distortions 
due to truncation start to be amplified and become audible as an edginess to 
the sound. This is why I insert a 24 -bit dither generator into my SADiE 
workstation, when feedingexteraal processors at 24 bits. Sonic Solutions 
workstations perform this chore automatically. transparent to the user. Z 
Systems Equalizero provide optional dither at the 24Ü1 bit, which should be 
engaged when processing. Weiss processors always dither when set to 24-bit 
output wordlength; it is not a user-settable option. 

8 According to Jim Johnston. there are several forms of truncation. depending 
on the Computer and the language in use. and none of them is good! 

9 Chcsky JD111, available at major record chains or through Chesky Records, 
Box 1268, Radio City Station, New York. NY10101; 212-586-7799 (I produced 
this di8c). The hard-to-find CBS CD-i, track 20, also contains a fade to noise 
test. 

10 You may use a DAT machine on E-E (Electronics to Electronics) to trúncate 
the signa!, but be careful. some models of DAT machines actually pass 24 bits 
through on E-E! 

11 Unlessyou are specifically lookingfor grunge, and a particular type of grunge 
at that. For the inharmonic distortion caused by quantization is vezy 
unmusical to the ear. Very different -sounding than turaing a Marshall 
amplifier up to 11, for example. I'H take my grunge the old-fashioned analog 
way, if you please! In other words, if a particular type of music is designed be 
aggressive, inyourfact, it still sounds better to me if that aggression is 
obtained with a combination of high-resolution. puré sounding (analog-like) 
dither. and distortion-generatingcircuitry that produces musically- 
harmonic distortion. See Chapter 16 for more on this topic. 

12 Since analog tape’s noise floor is much higher than that of dither, many 
would argüe that several generations of 16 bit dither circa -91 dB FS should 
be insignificant. I think it depends onthe material. Prístine, digitally- 
recorded material can sound veiled when "over dithered." But some rock and 
roll sounds better with lots of noise. or with fíat dither instead of noise- 
shaped dither. And the psychoacoustic argument goes on, which is why we 
have caro to make judgments! 


Chapter 4 60 



CHaPTer 5 I. Introductíon 


Decibels For 
Dummies 


This chapter summarizes the late 20 111 eentury 
approach to metering and leveling; it can be read as 
a preface to Chapter 15 in which we take these 
concepts into the 2i st centuiy. In the 20^ centuiy, 
because of their use of recording media with poor 
signal-to-noise ratios (SNR) engineers were often 
concerned with the signal peaks and with 
maintaining quality by maximizing the levels. With 
the advent of 24,-bit recording, the SNR of our 
media is no longer an issue, but it is still crucially 
important for us to understand what the decibel 
scales on our meters are realiy tellingus. 


So many ofus take our meters for granted—after 
all, recording is so simple: allyou do ispeak to o dB 
and nevergo over! But things only appear that simple 
until you discover one machine that says a recording 
peaks to -1 dB while another machine shows an 
OVER level, and yet your workstation tells you it just 
reaches o dB! We need to explore the concepts of the 
digital OVER, analog and digital headroom, machine 
meters, gainstaging, loudness, signal-to-noise ratio 
and take a fresh look at the common practices of 
dubbingand level calibration. 

II. Digital Meters and OVER Indicators 

Recorder manufacturers pack a lot in a little 
box, often ccmpromising on meter design to cul 
production costs. A few machines even have meters 
which are driven from analog circuitry—a definite 
source of inaccuracy. Even manufacturers who drive 
their meters digitally (by the valúes of the sample 
numbers) cut costs by putting large gaps on the 


61 


í 



1 


MYTH: 

The red light 
carne on while 
I was recording, but 
when I played ¡t 
back, there weren’t 
any overs, 
so I thought 
it ivas OK.' 


I 


I 


meter scale (avoiding expensive illuminated 
segments). The result is that there may be a -3 point 
and a o dB point, with a large unhelpful no man’s 
land in between. The manufacturer may feel theyTe 
doingyou a favor by makingthe meter read c if the 
actual level is between - i and o, but even if the 
meter has a segment at eveiy decibel, when it comes 
to playback, the machine can’t tell the difference 
between a level of o dBFS (FS = Full Scale) and an 
OVER. That’s because unce signal has been 
recorded, it cannot exceedfull scale again, as 
illustrated below. 



White an original analog signal can exceed the amplitude ofO dB, when that 
recording is reproduced, there will be no level above O.yielding a distorted 
square wave. This diagram shows a positive-going signal, but the same is true 
on the negative-going end. 

One way a signal can go OVER is during 
recording from an analog source. An early-warning 
indicator is a level sensor in an A/D converter, 
driven by the analog portion of the signal, which 
causes the OVER indicator to illuminate if the 
analog level is greater than the voltage equivalent to 
o dBFS. If the analog record level is not reduced, 
then a máximum level of o dB will be recorded for 
the duration of the overload, producing a distorted 
square wave. 


• Contributed by Lynn Fuston. 


After the signal has been recorded, distin- 
guishing between a full scale recording and one that 
actually went OVER requires more meter 
intelligence than I’ve ever seen on a typical machine 
or DAW. I would question the machine’s 
manufacturer if the OVER indicator lights on 
playback; it's probably a simple o dB detector rather 
than an OVER indicator. There are more sophis- 
ticated. calibrated digitalpeak meters such as those 
from Dorrough, DK, Mylek. NTT. Pinguin, RTW, 
Sony, and others, eaeh with unique features 
(including custom decay times and meter scales), 
but all the good meters agree on one thing: the 
definition of the highest measured digital audio 
level. A true digital audio meter reads the numeric 
code of the digital audio, and converts that to an 
aceurate reading.' 

The Paradox of the Digital OVER 

Awell-designed digital audio meter can actually 
distinguish between o dBFS and an OVER. But if the 
digital levels on the médium cannot exceed o dB, 
how can the meter distinguish an OVER after the 
recording has been made? The answer is that a 
specialized digital meter determines an OVER by 
countingthe number of samples in a row at o dB. 
The Sony i 63 o OVER standard is three contiguous 
samples, because it’s fair to assume that the analog 
audio level must have exceeded o dB somewhere 
between sample number one and three. Three 
samples is a conservative standard—most 
authorities consider disto rtion lasting only 33 
mieroseeonds (three samples at 44.1 kHz) to be 
inaudible. Depending onthe nature of the music, 
distortion lasting as longas one or two milliseconds 


Chapter 5 62 









islikely inaudible. Thus, at highersample rates, 
where many more samples go by in a short time, a 
case can be made to count many more contiguous 
full scale samples before warning the operator. 
Manufacturera of digital meters often provide a 
choice of setting the OVER threshold to 4, 5, or 6 
contiguous samples, but it’s better to err on the 
conservative side, to let the meter warn you before a 
problem could occur. If you stick with the 3 -sample 
standard, you'll probably catch audible OVERs. But 
stand by. Fin about to recommend why you should 
mix at even lower peak levels! 

Using External A/D Converters or Processors 

There is no standard for communicating OVERs 
on anAES/EBU or S/PDIFline. So ifyou’re usingan 
external A/D converter, the recorder's OVER 
indicator will probably not function properly, if at 
all. Some external A/D converters do not have OVER 
indicators, so in this case, there’s no substitute for 
an accurate external meter; without one I would 
advise not exceeding -1 dB. Fve already received 
several overloaded tapes which were traced to an 
external A/D converter that wasn’t equipped with an 
overload indicator. 

When making a digital dub through a digital 
processor you’ll find that most do not have accurate 
metering. Equalizer or filter sections can cause 
OVERs even when dipping levels! Contrary to 
popular belief, an OVER can be generated even if a 
filter is set for attenuation instead of boost, because 
filters can ring; they also can change the peak level 
as the frequency balance is skewed. Digital 
processors can also overload internally in a fashion 


undetectable by a digital meter. Internal stages may 
"wrap around" when they overload, without 
transferring OVERs to the output. In those cases, a 
digital meter is not a foolproof OVER detector, and 
there’s no substitute for the ear, but a good digital 
meter will catch most other transgressions. When 
you hear or detect an overload frorn a digital 
processor, trv using the processor’s digital input 
attenuator, or simply attenuate its output if you are 
sure the processor has sufficient internal 
headroom, explained later in this chapter. 

Oversampled Meters: Even More Sophisticated 

Reading the simple numeric code from the 
digital stream may not be enough to detect OVERs in 
the converters that reproduce that signal. During 
the conversión from PCM digital to analog, built-in 
low-pass filtering causes occasional peaks between 
the samples that are higher than the digital stream’s 
measured level, or even higher than full scale. 
Digital designers have known foryears that the 
actual output level of audio from a D/A converter 
can exceed o dBFS but very few have taken this into 
account in the design. TC Electronic has performed 
tests on typical consumer D/A converters, 3 showing 
that many of them distort severely since their digital 
filters and analog output stages do not have the 
headroom to accommodate levels which exceed o 
dBFS! Besides D/As, certain processing elements of 
the signal chain can distort with intersample peaks, 
includingsample rate converters and digital 
equalizers as we just explained. o dBFS+ peaks may 
rcach as much as 1 3 dBFS with ccrtain typcs of 
signáis; what this means is that to make the cleanest 
recordings and to be perfectly safe, you should 


63 Decibels for Dummica 



MyTH OF THE 
MAGIC CLIP 
REMOVAL: 

Tum it down after 
clipping and the clip 
will go away. 

I_I 

Chapter 5 


never exceed —3 dBFS on a simple (non- 
oversampling) digital meter! To demónstrate the 
problem and since this goes against typical wisdom. 
TC have developed an oversampling limiter and 
special oversampling 
peak meter in the 
System 6000. 

Practice Safe Levels 

Although there 
have been no 
psychoacoustic 
studies on their adversity, intersample o dBFS* 
peaks cause some following processing circuits to 
linger and extend the distortion, which makes post- 
processing and broadcasting seriously 
problematic. 4 And some critical listeners report 
improvements when measured intersample OVERs 
are eliminated. It makes sense for production 
engineers to practice safe levels duringrecording 
and mixingby staying well away from o dBFS on a 
standard peak meter and leaving the decisión on 
whether and how to raise levels to the mastering 
suite, where we make an educated decisión. 
Mastering engineers, if maximizing levels, should at 
least use an over-counting meter, plus a digital 
limiter whose ceiling is set to— 0.3 dB (see Chapter 
10) 2 but preferably an oversampling limiter and 
oversampled meter (to prevent downstream 
problems with DACs and radio processing). 
Clipping of any type is to be avoided especially if a 
recording is to undergo further processing, as 
demonstrated in Appendix 1. 5 


The Myth of the Magic Clip Removal 

If the level is turned down by as little as 0.1 dB, 
then a recording which may be full of OVERs will no 
longer measure any overs. But this does not get rid 

of the clipping or the 
distortion, it merely 
prevents it from 
triggering the meter. 
Some mastering 
engineers deliberately 
severely clip the signal. 
and then drop the level 
slightly, so that the meters will not show any OVERs. 
This practice, known as SHRED, produces veiy 
fatiguing (and potentially boringly similar) 
recordings. 6 

Peak Level Practice for Good 24-bit Recording 

Even though 24-bit recording is now the norm, 
some engineers retain the habit of trying to hit the 

top of the 
meters, which 
is totally 
unnecessaiy as 
illustrated at 
left. Note that a 
16-bit 

recording fits 
entirely in the 
bottom 91 dB of 
the 24-bit. You 
would have to 
lower the peak 
level of a 24-bit 
recording by 48 


{ "You would have to lower the 
peak level of a 24-bit recording 
by 48 dB toyield an effective 
16-bit recording !” 




64 


A 24-bit recording would have to be lowered in level 
by 48 dB in order to reduce it to the SNR of 16-bit. 
The noise floors shown are with fíat dither. 













dBtoyield an effective 16-bit recording! So there is 
a lot of room at the bottom, and you won't lose any 
dynamic range if you peak to —3 dBFS or even as low 
as -10 dBFS; you’ll end up with a cleaner recording. 
Distortion accumulates, 7 and at the mastering 
studio, a digital recording which is too hot can cause 
a digital EQ or sample rate converter to overload. A 
digital inix that peaks to —3 dBFS or lower makes it 
easier to equalize and otherwise process without 
needing an extra stage of attenuation in the 
mastering. 

Anuinber of 24 - bit A/Ds advertís eadditional 
headroom by employing a built- in compressor at the 
top of the scale. As we have seen, there is no audible 
improvement in SNR by maximizinga 34-bit 
recording and no SNR advantage to compressing 
levels with a good 24-bit A/D. 

How Loud is It? 

Contrary to popular belief, the levels on a digital 
peak meter have (almost) nothingto do with 
loudness. For example, you’re doing a direct to two- 
track recording (some engineers still work that 
way!) and you’ve found the perfeet mix. Now, keep 
vour hands off the faders, and let the musicians 
make a perfeet take. During take one, the 
performance reached -4 dB on the meter; and in 
take two. it reached o dB for a brief moment during 
a snare drum hit. Does that mean that take two is 
louder? If you answered "both takes are about the 
same loudness,” you’re probably right, because in 
general, the ear responds to average levels, not 
peak levels when judging loudness. If you raise the 
master gain of take one by 4 dB so that it, too 


reaches o dBFS peak, it will now sound 4 dB louder 
than take two, even though they both now measure 
the same on the peak meter. 

Do not confuse the peak-reading meters on 
digital recorders with VU meters. Besides having a 
different scale, a VU meter has a much slower attack 
time than a digital peak meter. In Chapter 15 we 
will discuss loudness in more detail, but we can 
summarize now by saying that the VU meter 
responds more closely to the response of the ear. 

For loudness judgment, if all you have is a peak meter, 
use your ears. Ifyou have a VU, use it as a guide, not 
an absolute, because it is still fairly inaccurate. 

Did you know that an analog tape and digital 
recording of the same source sound very different in 
terms of loudness? Make an analog tape recording 
and a digital recording of the same music. Dub the 
analog recording to digital, peaking at the same peak 
level as the digital recording. The analog dub will 
sound about 6 dB louder than the all-digital 
recording, which is quite a difference! This is 
because the peak-to-average ratio of an analog 
recording can be as much as 12-14 dB, compared 
with as much as 20 dB for an uncompressed digital 
recording. Analog tape’s built-in compressor is a 
means of getting recordings to sound louder (oops, 
did I just reveal a secret?). 8 That’s why pop 
producers who record digitally may have to 
compress or limit to compete with the loudness of 
their analog counterparts. 


65 Decibels íor Dummies 



MYTH: 

Normalization 
Makes the Song 
Levels Corred 

I 


The Myths of Normalization 

The Esthetic Myth: Digital audio editing 
programs have a feature called Normalization, a 
semi-automatic method of adjusting levels. The 
engineer selects all the segments (songs), and the 
Computer grinds away, searching for the highest 
peak on the álbum. Then the Computer adjusts the 
level of all the material until the highest peak 
reaches o dBFS. If all the material is group- 
normalized at once, this is not a serious esthetic 
problem, as long as all the songs have been raised or 
lowered by the same amount. But it is also possible 
to select each song and nonnalize it individually, 
which is part of the esthetic mythology—it’s a real 
no-no. Ifyou’re makingan álbum, never normalize 
individual songs, since the ear responds to average 
levels, and normalization measures peak levels, the 
result can totally distort musical valúes. A 
compressed bailad will end up louder than a rock 
piece! In short, normalization should not be used 
to regúlate song levels in an álbum. There’s no 
substitute for the human ear, and currently there is 
no artificial intclligcnce that docs as wcll." 

The Technical Myth: It’s also a myth that 
normalization improves sound quality of a 
recording; in fact, it can only degrade it. Technically 
speaking, normalization only adds one more 
degrading calculation and resulting quantization 
distortion. And since the material has already been 
mixed, it has already been quantized, which 
predetermines its signal to noise ratio—SNR of the 
recording cannot be further improved by raising it. 
Let me repeat: Raising the level of the material will 

• When a client asks me if I normalize I reply that I never use the computer's 
automatic normalization method. but rather songs are leveled by ear. I avoid the 
term normalization because it has been misused. 


not changc its inherent signal to noise ratio but will 
only add more quantization distortion in an 
unnecessary step. If the material is goingto be 
mastered, do not normalize since the mastering 
engineer will be performing further processing 
anyway. 9 

Judging Loudness the Right Way 

Since the ear is the only judge of loudness, is 
there any objective way to determine how loud your 
CD will sound? The first key is to use a single D/A 
converter to reproduce all your digital sources and 
maintain a fixed setting onyour monitor gain. That 
way you can compare your CD in the making against 
other CDs, in the digital domain. Judge DATs, CDs, 
workstations, and digital processors through this 
single converter. 

III. Calibrating Studio Levels: 

Headroom and Cushion 

Protectingyour A/D and mix from clipping does 
no good if your analog consolé, preamplifiers or 
processors are distortingin front of the A/D! Since 
mastering engineers usually chain múltiple pieces 
of gear, it’s important to understand how to 
optimize analog levels, distortion and noise when 
making signal chains in front of your A/D converter. 
Ostensibly, typical balanced analog gear has a 
nominal level of +4 dBu (reference .775 volts 10 , 
yielding i .?3 volts with sinewave. Unfortunately 
however, not all analog gear is created equal, and +4 
dBu maybe a bad choice of reference level. Iuse the 
term nominal to mean the average voltage level that 
corresponds with o VU, typically 20 dB below full 
scale digital (o dBFS). We need to examine some 


Chapters 


66 





easily overlooked factors when deciding on an in- 
house standard analog (voltage) level. 

One factor is the clipping point of consoles and 
outboard gear, Before the advent of inexpensive 8- 
buss consoles, most professional consoles’ clipping 
points were +24 dBu or higher. But a frequent 
compromise in low-priced consolé design is to use 
internal circuits that clip earlier, around +20 dBu 
(7 ?5 volts). This can be a big impediment to clean 
andio, especially when cascading amplifiers. To 
avoid the solid-state edginess that plagues a lot of 
modera equipment, the minimum clip level of every 
amplifier in a system should be 6 dB above the 
potential peak level of the music. The reason: Many 
opamps and other solid State circuits exhibit an 
extreme distortion increase long before they reach 
the actual clipping point, as they change from class 
.4 to class AB operation. This means clipping point 
should he at least + 3 o dBu (24.5 volts RMS) if o VU 
is +4 dBu! 

you Can Never Have Enough Headroom! 

A lot of solid-state designs start to sound pretty 
nasty when used near their clip point." All other 
things being equal, the amplifier with the higher 
clipping point will sound better. Perhaps that’s why 
tobe equipment (with its 3 oo volt B+ supplies and 
headroom 3 o dB nrgreater) often has agond. ñame 
and solid state equipment with inadequate power 
supplies or headroom has a bad ñame. Most of the 
robust-sounding solid-state equipment I knowuses 
very high power (but veiy expensive) supply rails. 

Traditionally, the difference between average 
level and clip point has been called the headroom. 


but in order to emphasize the need for even more 
than the traditional amount of headroom, I'll cali 
the space hetween the peak level of the music and 
the amplifier clip point a cushion. With analog tape, 
a o VU reference of +4 dBu with a clipping point of 
+20 dBu provided reasonable amplifier headroom, 
because musical peak-to-average ratios were 
reduced to the compression point of the tape, which 
maxes out at around 14 dB over o VU. Instead of 
clipping, analog tape’s gradual saturation curve 
produces 3 rd and 2nd harmonics, much gentler on 
the ear than the higher order distortions of solid 
state amplifier clipping. 

But it’s a different story when the peak-to- 
average ratio of raw, unprocessed digital audio 
tracks can be 2o dB. Adding 20 dB to a reference of 
+4 dBu results in +24 dBu, which is beyond the 
clipping point of many so-called professional pieces 
of gear, and so doesn’t leave any room at all for a 
cushion. If you adapt an active balanced output to an 
unbalanced input, the clipping point reduces by 6 
dB, so the situation becomes proportionally worse.' 3 
Dual-output consoles that are designed to workat 
either professional or semi-pro levels can be partic- 
ularly problematic. To meet price goals, 
manufacturers often compromise on headroom in 
professional mode, making the so-called semi-pro 
mode sound cleaner! It is an unpleasant surprise to 
discover that many consoles clip at +20 dBu, 
meaning they should not be using a professional 
reference level of +4 dBu (headroom of only 16 dB 
and no cushion). Even if the consolé clips at + 3 c 
dBu (the mínimum clipping point I recommend), 
that only leaves a 6 dB cushion when reproducing 



MYTH: 

+4 dBu ¡s always the 
best level to use for O 
VU with balanced 
analog electronics. 

I_I 


67 


Decibels for Dummies 




music with 30 dB peak-to-average ratio. That’s why 
more and more high-end professional equipment 
have clipping points as high as +37 dBu (55 volts!). 
To obtain that specification, an amplifier must use 
very high output devices and high-voltage power 
supplies. Translation—better sound (all other things 
being equal), and also higher cost due to the need 
for more robust power supplies and devices. 

These robust output drivers that have this kind 
of headroom sound better if they can deliver a clean 
high level into a 600 ohm load, which means they 
can probably handle long cable runs with their high 
capacitive loads. Long runs should probably be 

balanced, but since many 
mastering studios have 
small ground-loop areas, 
we often use custom-made 
unbalanced equipment, 
which often has simpler, 
quieter circuitry. 

One of the most 
common mistakes made by 
digital equipment 
manufacturers is to assume 
that, if the digital signal 
clips at o dBFS, then it’s OK 
to install a (cheap) analog 
output stage that would clip 
at a voltage equivalent to, 
say, 1 dB higher. This almost guarantees a nasty- 
sounding converter or recorder, because of the lack 
of cushion in its analog output section and the 
potential for o dBFS+ levels. 



How can we increase the cushion in our system. 
short of replacing all our distribution amplifiers and 
consoles with new ones? One way to solve the 
problem is to recalibrate all the VU meters. SNR will 
not be significantly lost if we set o VU = o dBu or 
even -4 dBu (not an international standard, but a 
decent compromise if we don't want to throw out 
equipment), and things will sound cleanerin the 
studio. Once we’ve decided on a standard analog 
reference level, we calíbrate all analog-driven VU 
meters to this level. At left is a diagram describing 
the concept of cushion. 


IV. Gain Staging—Analog and Digital 



In the top device, signal enters a passive attenuator and exits through an 
active amplifier stage. This Circuit effectlvely has infinite input headroom. 
The bottom device's input headroom is determined by the headroom ofthe 
input amplifier. 



Analog Signal Chains 

Now that we know how to choose an analog 
level, it’s time to chain our equipment together. To 
really get a handle on our equipment, we should 
determine its internal structure. The above figures 
represent two possible internal structures. All 
structures are variations on this theme. 


Chapter 5 68 




































To properly test analog devices and determine 
their internal makeup, use a good clean monitor 
System, an oscilloscope, a digital voltmeter and a 
sine wave generator that can deliver a clean +24 dBu 
orhigher (a tough requirement in itself). The first 
type of device has a passive attenuator on its input, 
which means that we can feed it any reaso nable 
source signal without fear of overload. We can prove 
thisby turning the generator up and attenuator 
down; if the output never clips within a reasonable 
range of the generator, then the device must have a 
passive attenuator on its input. Then, wc disconncct 
the generator and listen to the output of the device 
as we raise and lower the attenuator. There should 
be no change in noise or hiss, and the output noise 
should be well below —70 dBu unweighted, 
preferably below -90 dBu A-Weighted. This also is 
an indication that the device has a passive 
attenuator on its input. If the output noise changes 
significantly at intermedíate positions of the 
attenuator, then the internal impedances of the 
Circuit are in question, or there may be some DC 
offset. The output noise of this device will be limited 
by the noise floor of its output amplifier. We 
determine the best nominal operating level of this 
device by taking the output clip point and subtract at 
least 26 dB for headroom and cushion. 

The second type of device’s input is an active 
amplifier stage, whose design is much more critical. 
It is veiy rare to find a solid State device built this 
way vdiich that won’t clip with >+24 dBu input. 

While raising the signal generator, turn down the 
attenuator to keep the output from overloading. If 
we hear clipping prior to the generator reaching +24 


dBu, then the device has a weak internal signal path. 
The clip point determines the nominal analog input 
level, which shouldbe at least 26 dB below this clip 
point. Then, to check if the device’s internal gain 
structure is well balanced, we see if the output stage 
clips at the same point as the input stage or at a 
higher level. 

When cascading analog gear, the signal-to- 
noise ratio and headroom of the Cascade is 
determined by the weakest link, but by studyingthe 
internal structure of each piece, it may be possible 
to increase SNR of the chain by running higher 
levels at points in the chain that have higher 
clipping levels. With test tone and then music, listen 
closely to the noise floor and high level sound 
quality at the last device in the chain; if the output of 
the chain sounds good and reasonably quiet, then I 
don’t worry about tweakingthe chain. I was able to 
improve the signal to noise ratio of a tube-based 
tape recorder whose gain structure resembles the 
second device. The original manufacturéis conser- 
vative schematic specified nominal internal levels of 
—10 dBu at the output of the second active stage. But 
since the tubes distort at well above + 3 o dBu 
(headroom of 40 dB), I decided to run the 
attenuator higher and run levels of o dBu in the 
second stage. This improved amplifier signal to 
noise ratio from the second stage on, by 10 dB, 
without endangering distortion. The tube tape 
recorder still has 3 c dB of internal headroom. 

In an analog signal chain, raising the music 
signal level as high as practical as early as possible 
(within the limits imposed by headroom and 


69 Decibeis for Duinmies 


clipping point of A/D converters) will improve the 
signal to noise ratio of the entire chain. Then, later 
in the mastering, we will reduce the signal level 
digitally in the digital chain that follows. 

Digital Signal Chains 

Headroom of the Chain: It’s a lot harder to 
grasp what’s goingon inside a digital signal chain, 
but we can test digital performance for headroom, 
clipping, and noise. Suppose we have a digital 
equalizer with several gain Controls and 
equalization; we feed it a i kHz sine wave test tone at 
about -6 dBFS and turn up the i kHz equalization by 
10 dB, observingthat the output clips. Then we turn 
down the output gain control until the output is 
below o dBFS and verify by listening or FFT 
measurements that the internal clipping goes away. 
If not. then the internal gain structure of the 
equalizer does not have enough headroom to handle 
wide range inputs. We may be able to get away with 
tuming down an input attenuator, but the early 
clipping indicates that this equalizer is not state-of- 
the-art. It is probably a first-generation fixed point 
unit and should be replaced. Modern-day digital 
processors have enough internal headroom to 
sustain considerable boost in early stages without 
needing an input attenuator, and clipping can be 
removed solely by turning down the output 
attenuator. The internal structure could be double- 
precision fixed point or floating point (see Glossary, 
Appendix i 3 ); it’s not easy to tell without askingthe 
manufacturer. It is easy to be impressed by floating- 
point manufacturers’ claims of hundreds of dB of 
headroom above o dBFS, but 24 dB or so internal 
headroom above o dBFS is probably enough; most 


well-designed fixed-point producís have 24 ormore 
dB internal headroom. 

Disto rtion of the Chain and Individual 
Processor Levels: With a digital chain, we no 
longer have to consider the audio signal level 
between the various Ítems of equipment; raising 
the source signal in a 24-bit digital signal chain 
does not make a meaningful SNR difference, 
consideringthe inaudible (approximately -189 
dBFS) noise of the chain.' No longer should we get 
hung up on having a low signal level; instead, 
consider every calculation as a source of quanti- 
zation distortion. Instead of optimizinglevels , what 
matters most in a 24-bit digital chain is to reduce 
the number of total calculations; give the job of gain 
changes and other calculations to the components 
with the highest internal resolution (e.g., those 
which would introduce the least quantization 
distortion or grunge). In fact, we should avoid 
raisingthe signal until it reaches a device with the 
cleanest-soundinggain control, even if the source 
audio level is veiy low. For example, if the 
workstation has lower resolution, we tiyto hold 
everything at unity gain in the DAW and reserve the 
gain changes or EQ for higher-precisión devices 
later in the signal chain. In other words, pass a 
perfect clone (bit-transparent copy) of the source 
from the DAW onto the next device in line to do 
processing. 

Noise of the Chain: The only signiíicant noise 
floors in a 24-bit chain are not from the chain itself 
but from the original sources, including mike 
preamp noise. We are primarily concerned with the 

* Each processor does add its own quiescent or idle noise. which is cumulative. 
but in a good chain rarely adds more than 3 to 6 dB to the -139 dBFS RMS 
noise floor. 


Chapter 5 70 



E ítóJ' 


sr 


10 dB Gain 
+ new Dither 



A 16 -bit recording with peak level low at -10 dBFS. h/hen gain is raised 10 dB 
and redither is added, the original 81 dB signat to noise ratio is reduced by 
about 7 4 HR. 


impact of the summing of the higher level noises, 
andsumminga new 16-bit dither with the source’s 
dither noise can add a veil if the original was 16-bit. 

Let’s take an example of a 16-bit recording 
whcse peak level is 10 dB low, as in the above figure. 
In mastering we may choose to raise its level by 10 
dB and add 16-bit dither befo re turningit into a i6- 
bit CDR. This 16-bit recording’s original 81 dB SNR 
is the difference between signal at —10 dBFS and 
dither noise at — 91 dBFS.’ 3 When we raise the signal 
by 10 dB, both the original signal and the noise are 
raised equally, so the original signal to noise ratio is 
almost unchanged. However, the total SNR is the 
sum of the original dither which is now at -81 dBFS 
and the new dither which is at —91 dBFS. We ignore 
the insignificant noise of the gain processing, well 
below -i 3 o dBFS, so the total is -78.6 dBFS. and the 
SNR of the source has been deteriorated by (81- 
78.6) or 2.4 dB. The more gain we apply to the 
source, the more distant the oíd noise will be above 
the added dither noise, and the smaller the new 


dither will seem when the two noises are summed. 
So, reconsider doing anything if you have to raise a 
signal by only a few dB, because the new dither will 
be very cióse to the oíd; if we perform no gain 
change and just add dither, the noise floor is raised 
by 3 dB. If we lower the gain, the new dither 
predominates over the oíd. Despite this 
degradation, many times we have to live with 
compromises in mastering, since we still receive 
16-bit sources; and we are forced to adjust the level 
accordingto the esthetics of the álbum. I’ve had 
considerable luck reducing cumulative sonic veiling 
byusing noise-shaped dither.' 4 

The manufacturers of the Waves L? claim that 
peak limiting allows raising level enough to be 
significantly above the dither noise, and thus 
increases the signal-to-dither ratio and resolution. 
But exercise caution, because to my ears the 
apparent noise improvement is more than offset by 
the degradation of sound quality (the limiter 
reduces transient clarity). 

If we could avoid 16-bit dither, byproducing an 
output at 24~bit thatthe consumer could use, then 
mastering processing and gain-changing can be 
performed with no significant penalty, with noise 
floor 48 dB below the noise of 16-bit. This is the 
promise of delivering higher wordlengths to the 
consumer and another reason to record in 24-bit in 
the first place. 


71 Decibels for Dummies 























V. Analog to Digital Dubbing 
and Transfers 

Dubbing and Copying—Translating between analog 
and digital points in the system 

Let’s discuss the interfacing of analog devices 
equipped with VU meters and digital devices 
equipped with digital (peak) meters. When you 
calibrate a system with sine wave tone, what 
translation level shouldyouuse? There are several 
de facto standards. Common choices have been -20 
dBFS, -18 dBFS, and -14, dBFS translating to o VU. 
That’s why some DAT machines have marks at -18 
dB or -14 dB. I’d like to see accurate calibration 
marks on digital recordcrs at 12. 14. 18, and-20 
dB, whieh covers most hases. Most of the external 
digital meters provide means to accurately calibrate 
at any of these levels. 

How do you decide whieh standard to use? Is it 
possible to have only one standard? What are the 
compromises of each? To make an educated decisión, 
askyourself: What is my system philosophy? Am I 
interested in maintaining headroom and avoiding 
peak clipping or do I want the highest possible 
signal-to-noise ratio at all times? Am I interested in 
consistent loudness? Do I need to simplify dubbing 
prácticos or am I willingto rcquirc constant 
supervisión during dubbing (operator cheeks levels 
before each dub, finds the peaks, and so on)? Am I 
adjusíinglevels or processing dynamics—masteríng 
for loudness and consistency with only secondary 
regard for the peak level? 

Consider that puré, unprocessed digital 
sources, particularlyuncompressed individual 


tracks on a multitrack, will have peak levels 18 to 20 
dB above o VU. Whereas lypical mixdowns will have 
peak-to-average ratios of 14 to 18 dB (rarely up to 
20). Analog tapes will have peak levels up to 14 dB, 
almost never greater. And that's how the three most 
common choices of translation numbers (18,20, 
and 14) were derived. That’s also why each 
manufacturer's DAT recorder has a different analog 
output level, whieh makes it a pain to interface in a 
fixed installation. 

Broadcast Studios 

In Broadcast, speed andpracticality is our object, 
simplifying day-to-day operation, especially if the 
consoles are equipped with VU meters and 
recorders are digital. In broadcast studios, it is 
desirable to use fixed, calibrated input and output 
gains onall equipment. My personal recommen- 
dation for the vast majority of broadcast studios is to 
standardize on reference levels of -20 dBFS ~o VU, 
particularly when mixing to 2-track digital from live 
sources or tracking live to multitrack digital. With a 
—20 dBFS reference, you will probably never clip a 
digital tape if you watch the VU. If the sources are 
compressed, the peak level may never reach full 
scale, but the SNR losses are insignificant with 
24-bit recording. Use the top of the peak scale for 
headroom. 

When dubbing from analog tape to digital, 
consider the analog tape to be a compressed source, 
and retain the VU reference at -20 dBFS, even if the 
digital never peaks above -6 dBFS. This will result 
in more consistent levels throughout the plant. 
When dubbing from digital to analog, optionally 
consider a —14 reference to avoid saturating the 


Chapter 5 72 


analogtape, or use a high headroom analogtape at 
high speed, or simply accept the 6 dB or so analog 
tape compression that we’ve been enjoying for 
years. For the majorA/D/A converters in the 
complex, European broadcasters have settled on a 
-i8 reference, since most of the material will have 
18 dJ 3 or lower peak-to average ratio, and occasional 
clipping may be tolerated. I prefer the 30 dB choice 
te reduce clipping. 

Recording Studios 

For a busy recording studio that does most of its 
míxing, recording and dubbing to digital tape, 
standardizing on -30 dBFS will simplify the process 
and avoid clipping when watching VUs. When 
makingdubs to analogtape for archival purposes, 
choose a tape with more headroom, or use a custom 
reference point (e.g. -14 instead of-30), as the goal 
isto preserve transients on the analog tape for the 
exjoyment of future listeners. For archival 
purposes. I prefer to use the headroom of the new 
high-output tapes for transient clarity, rather than 
to jackup the flux level for a better signal-to-hiss 
ratio. 

One of the biggest problems in the contem- 
porary recording studio is dealing with playback of 
CDs and the VU meter on the consolé, because many 
contemporary CDs have loudness levels that would 
damage a mechanical VU meter by pinning ít, no 
matter what standard level you decide to calibrate 
the meter. Some recording studios solve this 
problem by switching the bus meter off when 
playing back commercial CDs, or by adding in a 


variahle meter attenuator. wliich I t.hink is 
dangerous because they may forget to return the 
attenuator to normal. The K-System Meter (See 
Chapter 15) is the ai st eentuiy approachto the 
problem. 

Mastering Studios 

Mastering studios are working more frequently 
in 30-bit or 34-bit. And we can engage in a custom 
dubbing level for each analogtape, optimizingthe 
level of the transfer according to sound quality, so 
fixed reference levels or calibration points for 
transfer are less important to us. 

Analog PPMs 

Analog PPMs have a slower attack time than 
digital PPMs, 6 to 10 ms instead of 1 sample (32 pS 
at 44.1 kHz). When working with a digital recorder, 
a live source, and desk equipped with analog PPM, I 
suggest a 5 dB "lead.” In other words, align the 
híghest peak level on the analog PPM to -5 dBFS 
(true peak) with sine wave tone. 

In Conclusión 

With this firm decibel foundation, we’re now 
ready to begin discussing our mastering tools and 
techniques. 


1 Ironically. there's still a tiny disagreement as to which numeric codc lo rrad, 
depending on the wordlength involved. Fornmately. a gentleman’s agreement 
has been to use cnly the top 16 bits to determine level. Full scale 16 bits 
(positive going, z's complemcnt) is represented by the numberom 1111 1111 
1111. However, this number is infinitesirmlly smaller than ful! scale (positive) 
24 bits. 011111111111111111111111. Tobe exact, the difference isanerrorof 
(only) 0.0001 dB. and most people have agreed to ignore the discrepancy! 

2 The manufacturera of the Benchmark A/ L) converter believe that counting 
contiguous saín pies is not a good idea, and they appiy an even more conser- 
vative standard cf any sample hittingo dBFS being considered an OVER. since 
an over-counting meter will never detect múltiple contiguous high frequeney 
signáis at o dBFS because they're faster than the sample ratc. I retort with the 


Decihels for Dumniies 



psychoacoustic argument that: a) high frequcncy signáis (c.g. 10 kHz) at full 
scalc do not occur in real music and b) the ear is far less sensitive to short- 
duration high- frequency overloads. But still. there’s nothing wrong vith bcing 
conservative, especially during initial A/D conversión and especially with 24- 
bit record i ng! 

3 Nielsen, Soren & Lund, Thomas (2000) o dBFS+ Levéis in Digital Mastering. 
AES 1091b Convention. Preprint *5251. 

4 Jim Johnston (in correspondence) points out that processors such as MPEG 
codera (MP 3 ). Dolby Digital encodera (AC 3 ). WMA, Real. etc. will add noise to 
your signal. Ifyou get too cióse to the edge. they will distort badly unless the 
input level is íiret reduced. The moral of the story is do not get too cióse to 
digital max! JI recommends a máximum peak level at or lower than -0.2 dBFS 
for the benefi: of post-proccssing. 

5 Thomas Lund of TC has investigated a number of modern-day pop albums with 
the overaampled peak meter. He observes that most CD playera are still in a 
distorted mode 200-700 ms afterbeinghit by such peaks. as are radio 
processors because of SRC on their inputs, phase rotatore, and othergenerally 
applied trícks. 

6 Glenn Meadovs and others discuss glired. on the Mastering Webboard: 

Glenn: "Here’s where I think all this is ccmingfrom. and it’s kids or.ented. 
Ever pulí up to a stop light. and get blasted from the car next toyou? (1 assume 
the answer i8 jes). Well, besides beingaggravated. actually listen to wliat's 
going on. ALLof the audio is clipped and distorted on the high end. THATs 
what people THINK things sound like. and are SUPPOSED to sound like. 

So. for the art.sts and producers. who areused to "cranking it up in their cara." 
and having the top and transients clipped/distorted. if they DONT hear that in 
their offices, then the mastering is just plain wrong. So, it’s once again filtering 
back lo the mix engineers, lo piuvide ihal bastí iu lite mu lo salisfy llieii 
clients (remember. we ALL have to satisfy our clients firet and foremast), so 
i ñatead of losing the gig to someonc clse vho W 1 LL provide that edge. eveiyone 
is doing the same thing. 

[Unknown respondrnt:] In other words.you are stating that the music 
business is currently conducted by people who don’t know what a record 
should sound like. 

Glenn: "You got it. Qean is OUT. distorted is in. If it's clean. it’s not right. 
Unfortunately. I've had too many sessions go that w ay in the past íew months." 
Chris Johnson: "There's no future inthai... clipping causes ear fatigue. Ear 
fatigue meanslisteners listen less before ceasing the listening. These people 
are only committing commcrcial suicide by going for stuff with no longterm 
sales capacity. It’s just the same as ifyou put everythingthrough an Aural 
Exciter turned up so far it really HURT. only this time around it's distortion." 

7 You don't alwtys get the best Telco engineera on broadeast remotes. During a 
TV outside broadeast. I once complained to Telco. and he replied, "The 
distortion is leaving here ok!” Another time, dunng level testing. Telco asked 
me to "send me another one of those eyeles." 

8 As much of the "compression" of analogtape comes from the generation of 
additional harmonios as from the level saturation effect. A harmoniegenerator 
will reduce the peak to average ratio of a rccording. 

9 If perchance you decide to do a remix. and your previous mix revisión was 
mixed at a low level. then by all means remix at a higher level. This isa good 
thing. Since the mixingprocess is a necessaiy (re)quantization step, ;his sort 
of "normalizaron" will raise the signal to noise ratio of the material, especially 
if you are mixing vía analog consolé. With an analog mix. raising the level of the 
mix mercases SNR by raising the level of the mix signal above the noise floor of 
the mixdown analog electronics and A/D. If you are mixing digitally. raising the 


signal level increases the signal above the quantization distortion of the digital 
mixing DSP. But since the quantization distortion in a state-of the-art DSP 
mixer will be around -139 dBFS, don't worry about raising the mix level unless 
it is significantly low (let’s say. -10 d 3 FS to be conservative). for there will be 
no audible SNR improvement. 

10 The onginof using*4 d Bu asa reference for analog audio instead of a more 
convenient number like o goes back to the earliest days of the telephone 
company. The deeíbel is a relative measurement. but the reference used by the 
telephone company was based on power. And the telephone company’s 
standard reference for o dB is one mdliwatt. which across their standard 
impedance of 600 ohmsyields 0.775 volts. This reference is commonly 
abbreviated as o dBm. The VU meterthen carne along; it is calibrated to 
produce 1 level of o VU with o dBm. but if put across the 600 ohrn line directly 
it would load it down and cause distortion, so the standard circui: included a 
36 oo ohm resistor in series with the VU meter. The 36 oo ohm resistor 
attenuates the meter by 4 dB. so the Circuit level has tobe raised :o <4 dBm in 
order to make the meter read o VU. 

Nowadays. modem -day equipmcnt gcnerally has low impedance outputs 
(sometimes as low as 10 ohms or less), and high impedance inputs (greater 
than 10 k ohms), so there is no meaningtul power transferred frem gear to 
gear. Instead, a voltage reference is the only thing that is meaningful. And to 
keep usir.g the same decibel levels we used for telephony. we kept the historieal 
reference of 0.775 volts instead of a more convenient number like 1 volt! Now 
when the dB is referred to a voltage of 0.775 volts, we cali that o dBu. And to 
make a VU meter read o in a modem low impedance Circuit with the right 
resistore. we have to feed it *4 dBu. or 1 .23 volts. Also sce Appendix 5. which is 
a short table of decibels. 

The equations are: 

lt o dBu is 0.775 volts, then +4 dBu is 1.2.3 volts. 20 * log (1.237.775) * 4. 

I thank Vike Collins for reminding me to inelude this explanation. 

11 This is of courae dependent on the skill of the designer. Some ICoperational 
amplifiers change from elass A to class AB as they approach their clipping 
point, which can explain the sonic "nasties." Howevcr. many Mosfet power 
ampliíiet designs clip gracefully. Similarly. power supply designand 
regulation has a lot to say about sound quality near the clipping psint. To avoid 
those nasties. measure and listen to be safe. 

19 To hr more exact. hcadroom is rpdmvd 6 dR ifyou íinhnlanre a tranRfnrmprlpss 
amplifier’s output. Transformer-coupled amplifiers retain theirheadroom 
evenifunbalanced. 

1 3 Simplifying the arithmetic, we assume the peak level is at -10 dBFS RMS and 
the dither noise is wideband and also RMS-measured at -91 dBFS (rounded 
from 96-4.77- 91.2). Anyway. chances are the music and room noise on the 
DAT are much higher than this dither noise, but the dither noise is the absolute 
minimum noise floor to consider. Ar.d many mastering engineers claim we can 
hear the degradation of dithering, even at as low a noise floor as -91 dB and 
even under music levels which are much higher! 

14 You may ask: Other than the esthetic job of matchíng one song to another. why 
are we botheringto raise the level of the recordingif the SNR of t.ie source is 
worsenec by the added dither? We also have to consider the noise floor of the 
final output electronics and D/A converter, and it is possible thatby peaking 
closer to full scale we may overeóme some of the weaknesses of the 
reproduction system’s noisy analog outputs. It's a matter of findingthe right 
balance and compromise amongst these sevcral factore. 


Chapter 5 74. 


CHaPTer 6 


I. Philosophy of Accurate Monitoring 


Monitoring 


The major goal of a professional mastering 
studio is to make subjective judgments as 
objectively as possible. You cannot afford to make 
mistakes when a record is released to thousands of 
listeners. Many of my clients are surprised to learn 
thal a well-niastered CD cari souiid wariu and olear 
on a wide range of systems, from low-end to high- 
end. How can this be done without compromising 
the integrity of the sound? Perhaps surprisingly, the 
answer lies less in using the right processing and EQ 
techniques (though these are the key), and more in 
the intelligent use of an accurate, high resolution 
monitoring system. 


Elements of a High-Resolution Monitor System 

A high- resolution monitor system is the 
mastering engineer’s audio microscope, without 
whieh subtle processing decisions cannot even 
begin to be made. The monitor system permits 
hearing inner details in the music that otherwise 
might be missed. 
and might then 
cause problems for 
the end listener. 



The recipe for 
constructing a 
high-resolution 

monitor system probably hasn’t been written, but 
we can describe some of the general elements: 


The mastering engineer’s monitor 
system is an audio microscope” 


i. With few exceptions, near-field monitors will not 
be found in a professional mastering room. 1 There 
are no little speakers, no representative cheap 


75 


speakers, no altemative monitors. Instead, there is 
a single pair of high quality loudspeakers (for 
stereo work), with which the mastering engineer 
is intimately familiar. Heknows exactly how their 
performance will transíate to the real world, and 
please the máximum number of listeners. 

3. The mastering room is extremely quiet, with all 
noise-producing equipment banished to the 
machine room. Noise floor must be better than 
NC 3 o, 2 preferably NC 20 orless in the 
exceptional facility. 

3 . There are no significant obstacles between the 
monitors and the listener within the standard 
equilateral monitoringtriangle. 

4. The electronic chain is designed for máximum 
transparency. Often specialized or customized 
components are built which incorpórate a bare 
minimum of active stages. 

5. Monitor loudspeakers and amplifiers have wide 
bandwidth, high-headroom, and extremely fíat 
frequency response. Sources of diffraction' are 
minimized. Cabinets are solid and non-resonant, 
as is the room, free of sympathetic vibrations and 
resonances. 

6. Monitors and listener are in a reflection-free 
zone, 4 which means that reflections from nearby 
surfaces arrive at the listener at least 2,0 ms later 
than the direct sound (preferably > 3 o ms) and at 
least 15 dB down (preferably >30). This specifi- 
cation can be determinedby time-delay 
spectrometry. 5 

The room is large enough to permit even, 

extended bass response, with no significant 

standing waves. Any remaining standing waves are 


controlled usingtechniques including Helmholz 
resonators or specialized diffusers. Room length 
should be at least 20 feet long for stereo, and in a 
critical mastering room, at least 3 o feet long for 
multicharmel, so that all speakers can be far enough 
from the walls to avoid the bass-resonance proximily 
effect. 6 The room should be wide enough so that 
first reflections from the side walls are insignificant. 
and/orthe side walls are treated to minimize 
reflections. Dimensions should be symmetrieal 
from lefl lo righl and a ceiling sloping upwards from 
the speaker end (cathedral ceiling) is a plus. 

Acoustical design and electrical layout are 
accomplished by experienced and trained profes- 
sionals, 

Subwoofers and bass response 

Stereo subwoofers, or prime loudspeakers 
whose response extends to the infrasonic, are 
essential for a good mastering studio. Vocal P pops, 
subway rumble, microphone vibrations, and other 
distortions will be missed without subwoofers, not 
just the lowest notes of the bass. Proper subwoofer 
setup requires knowledge and specialized test 
equipment (see Chapter 14). If subwoofers are 
inaccurately adjusted (e.g., "too hot,” in a vain 
attempt to impress the client) then the results won’t 
transíate well to other systems. 

Accurate subs are especially important in the 
hip-hop and reggae genres, but serve well to put 
rock and roll in perspective. By having accurately- 
calibrated subwoofers, we master a record that plays 
well on both boomy and thin systems. 


Chapter 6 76 


Apparent bass response is also greatly affected 
by monitor level. The equal loudness contours 
(originally studied by Fletcher, Harvey and Munson) 
díctate that a recording which is mixed at too high a 
monitor level will seem bass-shy when auditioned at 
alower level in a typical home environment. Thus, 
mixingand mastering at too loud a level is a conceit 
which we can ill-afford (see Chapter 15). 

Monitor Equalization—by ear or by machine? 

An inaecurate or unrefined monitor system not 
only causes incorrect equalization, it can also result 
in too much equalization. We must use our ear/brain 
in conjunction witb test instruments to ensure 
monitor accuracy. Test equipment alone is not 
sufficient — for example, although some degree of 
measured hígh-frequency rolloff usually sounds 
best (due to losses in the air) there is no objective 
measurement that says, "this rolloff measures 
right,” only an approximation. Different size rooms, 
monitor distances and monitor dispersions change 
ihe rolloff required to make the high end sound right. 

Thus, for the high frequencies, the ultímate 
monitor tweak must be done by ear. But this leads to 
the chicken and egg problem: ”If you use recordings 
to judge monitors, how do you know that the 
recording was done right?” The answer is to use the 
finest reference recordings (at least 25 to 50) to 
judge the monitors, and take an average. The highs 
will vaiy from a touch dull to a touch bright, but the 
majority will be right on if the monitor system is 
accurate. I tiy to avoid adding monitor correction 
equalizers; I prefer first to fix the room or replace 
the loudspeakers; my techniques inelude tweaks on 


speaker crossover components until the monitors 
fall precisely inthe middle of the "acceptance 
curve” of all 50 reference recordings. 

Note however that a variety of factors - the 
number of people in the room, interconnect cable 
capacitance, power amplifiers, D/Aconverters, and 
preamplifiers - can all affect low and high 
frequeney response, so if there are any changes to 
these, I immediately reevaluate the monitors’ 
response with the known 25-best recordings! 

Why Accurate Monitors Are Needed 

Here is my bell-curve theorj: Work to the middle 
of the curve, andyou’ll satisfy the máximum number 
of listeners. The mastering engineer strives to 
create a recording which will play well on the 
máximum number of reproduction Systems. Ifyou 
skew a recording in the bright direction, it will not 
play well on a lot of small systems that already have 
too much trcblc; convcrscly, ifyou skew it in the 
duller or heavier direction, with too much bass, it 
will not play well on systems that have too much 
bass. Thus, a recording which is well-balanced will 
satisfy the máximum number of listeners, as 
illustrated with the bell curve in this figure: 


Number of satlsfied 
listeners 



Deviatlon from accurate frequeney response 
Boomy or Accurate/ Thln and/or 

muddy Natural bright 


4 well-balanced recording satisfies the máximum number of listeners. 


ff Monitoring 





The closer we can make the recording reachthe 
middle of the curve, the more listeners we will 
satisfy. An accurate monitor System allows us to 
produce rccordings which are in the middle of the 
curve. We pride ourselves on knowing just how 
much bass is going to be right, so that the recording 
will play well in a club, or in a small home system. 
This doesn’t mean that we’re home-free as soon as 
we construct the room with perfect response. For 
example, there will always be home and car systems 
that distort when certain bass frequencies are 
excessive. In this case, experience is the best 

teacher; there are 
bass changes we can 
make that will not 
skew a recording 
away from the 
middle of the bell 
curve. We always 
check references on various example systems. 
Usually the recording translates to all of them. Or if 
not, a small rweak will fix the problem, at the 
frequeney that we Ve identified causes problems on 
the problem system. We engineer the change while 
listening on the accurate mastering system to 
confirm we are not skewing the recording away from 
the middle, or listeners with other systems will 
likely have the opposite problem. We also keep in 
mind that the ear hears peaks much easier than 
dips, so we can get away with some dips if necessaiy 
to please a recalcitrant client who judges eveiything 
on a problem system. 


A m,onitor which maíces 
everything sound beautiful 
must not be accurate. ” 


II. Debunking Monitor myths 

There is some resistance to the theoiy thatyou 
need accurate monitors, but certainly not amongthe 
majority of mastering engineers. 7 You can’t argüe 
with success—the most successful mastering 
engineers work with wide-range, fíat-response 
monitor systems. 

Myth *1: you must mix (master) with real-world 
monitors to make a recording for the real world 

Here’s a recent post from a mix engineer on 
Lynn Fuston’s Internet bulletin board: 

In Reply to: Best near-field monitors 

Frankly, I am at the point where I don’t 
like to mix using reference monitors 
anymore. My monitors are so nice to 
listen to, but they are just too unreal. 

They are perfect for critical listening, 
bul it is easier to make a real-world mix 
using an oíd receíver and a pair of oíd 
JBL home speakers and a boom box. 

I don’t care how well you think you 
know your souped up monitors. They 
will convincingly reproduce low 
frequencies that will distort like crazy 
on your neighbor's home stereo, and 
they will produce sparklíng high end 
that completely disappears on the 
stereo at your mom’s house. 

Beauty versus accuracy? First of all, I doubtthe 
correspondent was describing accurate monitors. It 
sounds like he was describing beautiful monitors. 


Chapter 6 78 


because they are "so nice to listen to,” the polar 
opposite of monitors which are "perfect for critical 
listening.” There are speakers which are 
non-discriininatory; you know tliem well— 
evervthing sounds beautiful on them. A monitor 
which makes everything sound beautiful or which 
masks the fine differences between sources, must 
not be accurate. Beautiful systems are loudspeakers 
which are voiced, or which have faults that always 
make them sound "beautiful” (such as horn 
resonances, smeared imaging, diffraction, and 
dispersión qualities that emphasize the ambience in 
a source). On the contrary, an accurate monitor is 
merniless, revealingall distortions orfrequeney 
anomalies. On my mastering System, excellent 
recordings sound wonderful and beautiful, but 
inferior recordings do not sound veiy pleasant. 
That’s a characteristic of a monitor system which is 
"perfect for critical listening.” 

Good monitors sound sparkly? This is not 
true. Accurate monitors do not sound sparkly. The 
póster remarked, "the [good monitors] will 
convincingly reproduce low frequencies that will 
dístort like crazy onyour neighbor’s home stereo, 
and they will produce sparkling high end that 
completely disappears on the stereo at your mom’s 
house.” I think he must be describingsomeone 
else’s good monitors, because a mastering engineer 
listeningto accurate monitors will not be temptedto 
tura the bass up too far or cut the treble too much. 
The poster’s conclusions have to be based on 
workingwith inaecurate, low resolution monitors. 

Typical Monitor Speakers? There is no such 
thing as a typical or representative small monitor. Just 


like the bell curve pictured above, mini-monitors’ 
frequency responses vaiy all over the place. 8 Mixing 
engineers who believe that their particular flavor of 
colored miiñ-mouitor is accurate will produce 
mixes with faults that sound bad on other monitors. 
Only a few mix engineers with a strong adaptive 
ability have learned how to work with small and 
near-field monitors and mentally compénsate for 
their weaknesses. Though this minority of well 
trained mix engineers can get excellent results with 
mini-monitors, mastering engineers should never 
depend onthem. 

Most times 1 can tell an NS- to/nearfield mix 
when it arrives for mastering. The bass drum is far 
too boomy (a particular problem with NS-io mixes), 
the vocal is often too low (probably caused by center 
buildup inthe nearfield enviro nment), thereverb is 
sometimes too low (the headphone effect enhances 
inner details), the midbass of the bass instrument is 
depressed (caused by resonances or comb-filtering 
artifacts from consolé surface), the stereo 
separation is veiy small (imagine a big pair of 
headphones), and the high end is, well, 
unpredictable. But onc time out of ten, I am shockcd 
and pleased to learn that a mix engineer got good 
results with colored mini-monitors. Butthere’s 
more to this story than meets the ear! That mixing 
engineer made it a point to take references of the 
recording to various places to see how r it was 
translating, and then made adjustments befo re 
committing the mix. Not all models of nearfield- 
mnnitors are tonally colored, so what remanís to 
conquer are the problems due to their position and 
proximity to reflecting suríaces. 


79 Monitoring 


When I mix, I choose monitors that are as 
accurate as I can obtain for the mixing space. I go to 
great effort to lócate the monitors on solid stands, 
far from obstructions like consoles and racks or 
reflections like the control room glass. I may even 
move the loudspeakers off the consolé to the left or 
right, which makes producers think I’m crazv until 
they sit down and listen. It seems weird to be 
moving faders and looking to the side, but not in the 
ñame of getting a great mix. It’s demonstrable that 
mix engineers use much less EQ when the monitors 
are accurate. 

Myth ”2: Adding high end helps inferior monitors 
that are weak in the highs. 

This is an untruth. Firstly, the recordingwill 
sound sharp, tinny and fatiguing on any monitor 
that has adequate highs, and there are plenty of such 
representatives along the bell curve of inferior 
monitors. Next, radio play will suffer, because as 
mentioned, the radio limiters will just cut back the 

highs that you have 
\ added. But most 
important... 

"The Midrange is the Key ” > The midrange 

is the key. As 
I described in the 
chapter on 

Equalization, adding too much high end depresses 
the lower midrange. You may end up with a vocal 
which has no power when reproduced on a limited 
bandwidth system; for example, adding highs 
actually reduces the strength of the male vocal in the 
mix when auditioned on a limited bandwidth 


system. The major power of a male vocal is in the 
fundamental range circa 250 Hz. If that range is 
depressed, the recording runs the risk of having a 
vocal which will not transíate over the widest variety 
of systems. Tiy this: Take a great recording. Play it, 
go into the next room and listen. The information 
still comes through despite the filtering of the 
doorway, carpets and obstacles. Then try filtering 
the recording severely below 300 and above 5 kHz 
(like the sound of an oíd, bad cinema loudspeaker). 
A good recording will still transíate. This tells you 
that the midrange is the key. If you lose the 
midrange. you lose it all. I am reminded of my first 
experience with my audiophile álbum of Paquito 
D’Rivera Tico Tico. This recordingwas made with 
minimalist miking, no equalization, ñor 
compression. It has a veiy natural tonal balance, yet 
it plays well everywhere. Why? Because the 
midrange is right. 

Myth *3: Heavy compression is necessary to prevent 
small monitor systems from overloading. 

I have found the opposite to be true, with few 
exceptions. When I take my dynamic, impacting 
masters to a little Aiwa 3 -piece system, they sound 
(comparatively) compressed, with fewer transients 
and less impact. If I reduced the transient clarity in 
the mastering, it would only sound worse on the 
smaller system, which does its own compressing! 

I believe that high-quality monitoringis nearly 
as important for mix engineers as for mastering, 
because the mini-monitors don’t reveal the damage 
of all those tempting low-resolution plugins and 
overcompression, and then it’s too late to fix. 


Chapter 6 


80 


III. Refinements 

Altérnate Monitoríng Systems 

Mastering engineers use altérnate loudspeakers 
as a double-check, not as a benchmark. I place all 
altérnate monitoring systems outside the mastering 
room. Having an altérnate system in the mastering 
room wastes time, and confuses the client. 
Furthermore, the altérnate loudspeakers are likely 
tointerfere acoustically with the niain system. Il is 
better to focus on a single monitor system that will 
not foolyou into making wrongjudgments. At 
Digital Domain. I have a second system in a sepárate 
room that I can feed 
"live" from the 
mastering room. 

This system has veiy 
large. "loose - 
sounding” woofers, 
and represents one 
extreme in the 
acceptance bell 
curve. It is fairly 
representative of 
ivhat may happen to 
the bottom end of the 
recording in a club, and somewhat helps interpret 
what may happen in a car. Though cars are so 
unpredictable, about all we can say is they will have 
rery uneven bass response and a resonance at one or 
more low bass frequencies. Plus user Controls that 
we often find in the smile shape (as in this photo). 

I’ve learned to watch out for recordings where 
the client is looking for veiy hot bass or bass drum. 


and I use the "extreme” system to demónstrate what 
could happen if they push things too far! Because if 
we boost the record’s bass in the mastering room to 
get that sound, it won't sound right anyivhere else. 

It’ll actually overdrive a typical car system. Many 
clients are not used to a neutral reproduction 
system; the hip-hop or reggae client may want it to 
sound like it does in his car in the mastering room. 
The boomy altérnate listening room does the trick. 

One mastering studio has a radio station 
transmitter and processor intheir machine room, 
and invites the client out to their car to hear what it 

will sound like on 
the radio. This is a 
great idea, as long 
as the client is 
realistic about the 
limitations of the 
car system, for if 
you make the 
recording bright 
enough for most 
cars, it will 
screech on any 
decent system. In 
other words, use the car system as an example of an 
extreme, not the least common denominator. 

Narrowcasting 

There are boombox systems, club systems and 
car systems especially engineered for rnusic sueh as 
hip hop whose bass response/resonance is 
extremely exaggerated. Properly-engineered 
recordings sound so thick on these systems that the 



Here’s what we're up against. 


81 Monitoring 



vocals are almost completely lost! It is almost 
impossible to make a master that plays well on such 
an extreme system that doesn’t sound thin and 
liíeless on all the others. We cannot inelude these 
extreme Systems when making a master if we want to 
please the máximum number of listeners. Instead, 
the best solution is to make a sepárate (dedicated) 
master for the club(s) or venues. 

IV. In Summary 

The major goal of the mastering studio is to 
make subjective judgments as objectively as 
possible. Mastering engineers confirm that accurate 
monitoring is essential to making a recordingthat 
will transíate to the real world. The fallacy of 
depending on an inaccurate "real-world-monitor” 
can only result in a recordingthat is bound to sound 
bad ona different "real world monitor." 

Even the best master will sound different 
eveiywhere, but it will sound most correct on an 
accurate monitor system. Which leads us to this 
comment from a good client: 

I listened to the master on half a dozen 
Systems and took copious notes. All 
the notes cancelled out, so the master 
must be just rigrht! 


1 Scc the section on comb- filtering in Chapter 3. Jim Johnston (prívate 
correspondente) notes that nearfields use a completely different listening 
methodthan what almost anyone uses in the real world. i.e. mos: real world 
listenere. other than boombox and headpbone listeners are well into the 
diffuse field of the room. 

2 NC 3o. Noise criterion 3o decibels. follows an attenuation curve whcreby at i 
kllz noise level is3o dB. and at lower frequcncics is peíinitted tu iise. 

3 Veiy few near-field monitors pass the "bandwidth and eompression test.*' 
Almost none have sufficient low frequeney response to judge bass and 
subsonic problems. and veiy few can tolérate the instantaneous transients and 
power levels of music without monitor eompression. If your monitors are 
alreadycompressing, how canyou judge your own use of eompression? 
Diffraction is the bounce of an acoustic wavefront from cabinet edges. causing 
a "smearing” of the sound quality. This can be rcduced by using round instead 
of sharp cabinet edges. and soft materials on the edge instead of hard. 

4 This term was coined by Dr. Pcter D’Antonio of RPG. 

5 The advantage of this monitoring environment is that time-domain errors in 
the musical material will be more audible, since they will not be masked or 
smeared by the monitoring room itself. Time delay-based measurement used 
to be exiremely expensive, but has reached affordability with theadvent of fast 
personal computers and decent aud.o software. In the absence o: TDS 
equipment, an objectivc subjective test called the LEDR test can help 
determine if nearby reflcctions are .nterfcringwith the monitoring. LEDR 
(Listening Environment Diagnostic Recording) is available frorr. Chcsky 
Records. (http://www.chesky/com) an JD37. First playthe announce trackand 
confirir. that the announcer’s positions are correct. If not. then adjust speaker 
separation and angle. Then play the LEDR test. The beyond signal should 
extend about 1 foot to the left and right of the speakers. If not. then look for 
aide wall rcflcctiona. Similarly, the up signa! should risc straightup, 3 to 6 fcct. 
and the over signal should be a rainbow risingat least as high as ihe up. If not. 
look forinterferingobjeets above ar.d bctween the speakers. or cefective 
drivers or crossovers. Frequeney response of left/right pairs must be well - 
matched for a perfect LEDR score. 

6 Unlcss the speakers are placed in soffits within the wall structure, which 
requires considerable acoustical expertise. It’s much easier to design a room 
with free-standingloudspeakers. 

7 There are a few major mastering engineers left who use non-standard, non- 
flat monitors. but like the best mix engineers. they liavc lcamcd llicú Ludí» 
and kncw how to make a master transíate to the world. However. 1 would not 
advi8e that a new mastering engineer start out this way. Very few people have 
the abiLty to adjust their inner hear.ng this well. Similarly. somt mastering 
engineers skew their monitor systems by using underpowered tube power 
amplifiers to make their judgments. which I feel is dangerous. as the natural 
compreision of tubes may prevent them from knowing if a recorJing needs 
some eompression or may mask overcomprcssion. See Chapter 10— 
Compression and Monitoring. Tubes can work in a high-powered monitor 
amp. and hundreds of watts are requíred to keep tube amplifiers feeding 
typical inefficient loudspeakers from skewing in the ovcrcomprcssed 
direction. 

8 The LSR senes from JBL accomplishes the most appropriate best compromise 
in monitor accuracy. Each smaller monitor in the series has a strong family 
resemblance to its larger cousins. with very linear frequeney response down to 
its bandwidth limit. Which means that when placed in a linear environment 
(rarely encountered on the top of a consolé), the smaller LSRs w ill only be 
mi$8ingthe extended portion of the low range. with the rest beingpretty 
accurate. 





we’LL 

FIXit 

XHTHe 

MIX. 


— Anón 



P A R T II: M A S T E R I N C TECH ÑIQUES 


rr 

IT’S nOT HOW 
LOUD” a Ker T . 

it’s HOW 

you maxe 

TT LOUD. 

11 


— Bou K at7 . 


CHaPTer 7 


Putting The 
Album 
Together 


Introduction 

Sergeant Pepper is often cited asthe first rock 
and roll concept álbum, i.e. an elaborately-designed 
álbum organized around a central theme that 
allegedly makes the music more than a simple 
collection of songs. This started a trend in the 70’s 
that many assume has more or less died. But is the 
concept álbum really dead? I’m not so sure; I treat 
every álbum that comes for masteringas a concept 
álbum, cvcn if it docsn’t havc a fancy theme, artwork 
or gatefold. The way the songs are spaced and 
leveled contributes greatly to the listener’s 
emotional response and overall enjoyment of the 
álbum. It is possible to tura a good álbum into a 
great álbum just by choosingthe right song order, 
though, unfortunately, the converse is also trae. 

I. Sequencing: How to Put 
an Album ¡n Order 

Sequencing is an art. Sometimes, the inusicians 
making an álbum have a good idea of the song order 
they’d like to use, but many people need help with 
this tricky chore. Traditionally, the label’s A&R 
person would help put the álbum in order, but in 
today’s world of independent productions that 
Service is not always available. This is frequently the 
producer’s job, or clearly someone experienced, 
politically "neutral”* and esthetically inclined. A 
mastering engineer hridges the nebulous división 
between artist, producer, and engineer—having 
heard thousands of albums and being au courant, he 
may provide useful guidance during this process. 

* Albums produced by a band member(s) sometimes sufler from the more me 
syndromc. where each musician wants to hear his or her instrument louder. 
The only way to avoid more me is to use a producer/engincer who has no 
"political" alliances and is vorking for the concept of the álbum as a whole. 


«7 



This is my approach: First let me tellyou what 
usually does not work—Don’t try to respond 
intellectually. One musician thought it would be a 
good idea to order his álbum by the themes 
presented in the lyrics; he started with all the songs 
about love, followed by the songs about hate, and 
finally the songs about reconciliation. It was a 
musical disaster. The beginning of his álbum 
sounded musically repetitive, because all his love 
songs tended to use the same style, and 
furthermore, the progression of intellectual ideas 
simply was not obvious to the average listener, who 
primarily reacted to the musical changes. Even 
when the listener got the intellectual point, it didn’t 
contribute much to the enjoyment of the álbum. 
Listeningto music is first and foremost an 
emotion al experienee. If we were dealinghere 
with lyrics (poetry) without music, perhapsthe 
intellectual order would be best, but the intellectual 
point of the álbum will still come through, cvcn if the 
songs are organized for primarily musical reasons. 

Before proceedingto order the álbum, it's 
important to have its gestalt in mind: its sound, its 
feel, its ups and downs. I like to think of an álbum in 
terms of a concert. Concerts are usually organized 
into sets, with pauses between the sets when the 
artist can catch her hreath. talk briefly to the 
audience, and prepare the audience for the mood of 
the next set. On an álbum, a set can consist of only 
one song, but most often is three or four. There are 
no strict rules, but usually the space between sets is 
a little greater than the typical space between the 
songs of a set, in order to establish a breather, or 
mood change.* Sometimes there can be a long segue 


(crossfade) between the last song of a set and the 
first of the next. These basic principies apply to all 
kinds of music, vocal and instrumentáis. 

Now comes the job of organizing the sets. To 
make it easier, I usually prepare a rough CD of all 
the songs. or a playlist on a DAW (my favorite) to 
allow instant play of all the candidates. This is a lot 
easier than it was in the days of analog tape. Then I 
make a simple list, describing each song’s charac- 
teristics in one or two words or symbols, such as 
uptempo, midtempo, bailad. Sometimes I’llgive letter 
grades to indieate which songs are the most exciting 
or interesting, tryingto place some of the highest 
grade songs early in the order.' I may note the key of 
the song, although this is usually secondary 
compared to its mood and how it kicks off. If there’s 
a bothersome clash in keys, sometimes more 
spacing helps to clear the ear, or else I exchange that 
song with one that has a similar feel and compatible 
key. 

The openingtraek is the most important; it sets 
the tone for the whole álbum and must favorably 
prejudice the listener. It doesn’t have to be the hit or 
the single, but almost always should be up-tempo 
and establish the excitement of the álbum. Even if 
it’s an álbum of ballads, the first song should be the 
one that hits the listener’s heart and soul. 

If the first song was (hopefully) exciting, we 
usually tiy to extend the mood, keep things moving 
just like a concert, by a short space, followed by an 

* Similariy, classical albums have shorter space* between movements than 
between the major nurnbers. 

t That’s Ufe. Not every song is a masterpiccc, but it’s important to give your best 
impression as early as possible. 


Cliapter 7 88 



up- ormid-tempo follow-up. Then. it’s a matter of 
deciding when to take the audience down for a 
breather. Shall it be a three- or four-song set? I 
examine the otheravaiiable songs, then decide if it 
will be a progression of a mid-tempo or fast third 
song followed by a relaxed fourth, or end with a nice 
relaxed third song. 

At this point, there are track numbers penciled 
next to the candidates for the first set of the álbum. I 
play the beginning of the first song to see how it 
works as an opener, then skip to the last 3 o or 40 
seconds, play it out and jump to the start of the 
second songto see if that works. The listener 
actually reacts more to the musical transition than to 
the entire feel of the previous song. This is how to 
join different musical feels-, an up tempo song that 
comes down gently at the end can easily lead to a 
bailad. Ifthe set doesn’t flow, I subsütute songs 
untilit works. 

Then, I check off the songs already used on the 
list, and pick candidates for the second set, usually 
starting with an up-tempo in a similar "concert” 
pattern. This can be reversed, of course; some sets 
may begin with a bailad and end with a rip-roaring 
number, largely depending on the ending mood 
from the previous set. A set can also be a roller 
coaster ride, depending on the mood we want to 
créate. Regardless, when you consider the álbum in 
ternas of sets, it becomes a lot easierto organize. By 
the way, the ultimate listener doesn’t usually realize 
that there are sets; our work ends up as only a 
subliminal contribution to the feel. As the set list 
gets filled up, it becomes a jigsaw puzzle to make the 
remaining pieces fit. Perhaps the third or fourth set 


doesn’t work quite as well as the first. Perhaps one 
of the songs just doesn’t transition into the other. At 
that point I try aone-songset, orsee if this problem 
song works better in an earlier set, either replacing 
a song, or adding to the earlier set. It can get 
frustrating, but it will all come together in time. 

The Odd Man Out 

One song may just not fit well musically with the 
rest. For a Brazilian samba álbum which I was 
mastering, the artist also recorded a semi-rock 
blues number. She said eveiyone loved this song in 
Brazil, so we couldn’t excise it from the álbum, but 
stylistically it did not seem to gel as a part of any set. 
At first I suggested puttingit last as a "bonus track,” 
but this ruined the feel of the original álbum ending, 
which was a beautiful, introspective song that really 
did belongat the end. Eventually, we found a place 
for the offender near the middle of the sequence, as 
a one-song-set, with a long-enough pause before 
and after. It served as a bridge between the two 
halves of the álbum. 

The Right Kind of Ending 

So, how to end the álbum? What is the final 
encore in a concert? It’s almost never a big, 
uptempo number, because the audience always cries 
"more, more, more.” You’ve got to leave them in a 
relaxed, comfortable "goodbye mood,” otherwise 
you’ll be playing encores forever. That’s why the last 
encore is usually an intimate number, or a solo, with 
fewer members of the band. The same principie 
applies with the record álbum. I usually try to create 
a climax, followed by a dénouement. The climax is 
obviouslyan exciting song that. ends with a nice 
peak. This, followed by one or two easy-going songs 


#9 PuttingThe 

Album Together 


to cióse out the álbum. When I find the perfect 
sequence, it’s a real treat! 

II. SpacingThe Album 

The first thing to remember is never to count 
the seconds between songs. Experienced producers 
know that the oíd "4 second" ”3 second” or "2 
second” rule really does not apply, although it is 
clear that álbum track spacing has gotten shorter 
over the past 50 years, along with the increased pace 
of daily Ufe. The correct space between songs can 
never accurately be estimated or counted, so putting 
an exact number on it is probably meaningless. 
Different people start counting at different times; 
the last few moments of a decay often signal the feel 
of the space between the tunes. The Computer may 
ohjectively say that a space is only 1 second, but the 
ear mav feel it’s closer to 3.5. So I’ve stopped 
counting seconds, and just go by the feel. As a 
general rule, the space between two fast songs is 
usually short, the space between a fast and a slow 
song is médium length, and the space between a 
slow and a fast song is usually long. The space 
followinga fadeout is usually very short, because the 
listener in a noisy room or car doesn’t notice the tail 
of a fadeout. Often we have to shorten fadeouts and 
make segues* or the space will seem like forever at 
home and especially in the car. Spacing is also 
dependent on the mood of the producer and time 
of day. Ifyou space an álbum inthe morningwhen 
you're relaxed, it almost always sounds more 
leisurely than one which has been paced in the 
afternoon, when hearts are beating faster. The 
solution is to be aware of your inner self and not 

* Segue (pronounced seg-way)—a crossfade or overlap of two elementa. Webster’»: 
proceed without interruption. hallan: ¿eguire. to follow\ 


make too short a space when you’re in a fast mood, 
or too long a space when you’re very relaxed; the 
result will probably average out for the listener. 

Consider thepace of an álbum, which is affected 
by intertrack spacing. As described above, we often 
want the first set to be exciting, so you may want to 
control the pace by using shorter spaces within the 
first set and then slightly longer spaces thereafter. 
Tricks like these have some psychological power 
over the listener. An interesting observation is that 
if you start with tight spaces and then make the rest 
of the spaces "normal,” the normal spaces seem too 
long, because your internal sense of timing has been 
alLered by the pace of the first seetion. Manipúlate 
spaces to produce special effeets—surprises, super- 
quick and super-long pauses make great effeets. 

One client wanted to have a long space in the middle 
of his CD, about 8-10 seconds, to simúlate the 
change of sides of an LP. Rather than rejectinghis 
idea out of hand (always respect the input of Creative 
individuáis), I tried the super-long space, and it 
worked! This was largely due to his choices of songs 
and the order. The set which began side two had a 
significantly different feel, and the long space 
helped to set it off, like a concert intermission. 

Some engineers like to think of spaces as 
punctuation marks. There’s a comma space, a 
semicolon, and a period. Never judge a space by 
dropping the needle on the record , that is, by 
auditioning 3 o seconds or so of one tune’s tail 
followed by the beginning of the next. Inevitably the 
listener will need a bit more of a breath before 
startingthe next, especially if it's the space between 


Cliapter 7 90 



two sets. That period space won’t feel like a period 
whenyou've heard the entire song, orthe whole set 
in context. Experience teaches us to anticípate these 
effects. so we add more of a breath after an exciting 
song and we know to preview far enough baek to get 
more of the holistic feel. Still, sometimes the first 
CD reference needs spacing adjustments. 

For a fast-paced pop álbum, if in doubt I prefer 
to make a space too short rather thantoo long. I 
sometimes will cut a space shorter and shorter until 
it is obviously too short and then add just the 
soup9on necessaiy to make it sound "just right," 
especially knowing that it always seems longer at 
home. Then there’sthe question of the ideal space, 
when the rhythm of the previous song leads very 
well into the attack of the next, where we count 
beats, and make the following song land on the beat. 
Finally, there’s the mystery space, where it’s not 
obvious what will work best. So, I try both long and 
short spaces, inching them up or down until it’s 
obvious which approach is best. 

We didn’t have this kind of luxury in the days of 
analogtape, and it’s interestingto note that when an 
LP master comes in for conversión to CD the spaces 
always seemtoo long. One reason, as I’ve said 
before, is the current quicker pace of life, but the 
other is that vinyl noise acts as a filler. When there’s 
dead silence between tracks. spaces always seem 
longer. I may remove 2, or more seconds out of an LP 
space and it will feel just fine on CD. 


III. PQ Coding 

Spaces and PQ (Track) Coding 

The CD Redbook standard does not permit 
official pauses shorter than 2 seconds between 
tracks. This doesn’t meanyou cannot have a one 
second or shorter space between songs, it only 
means that there will be no official pause between 
tracks, where the CD player would be counting 
backwards (officially, this is called Index Zero). 
Instead, the next track mark also fnnetions as the 
end mark of the previous track. 

When two songs segue into one another, the 
placement of the next track mark is critical, because 
CD players take finite time to cue—up to about 5 
SMPTE frames, for older players. So if there is an 
overlap where the previous song is fading out on top 
of the next, the track mark has to be placed 
extremely cióse to the top of the next song, or slow- 
cuing CD players will reveal a piece of the previous 
sound.* Sometimes this cannot be avoided, but 
many times an experienced mastering engineer will 
find a solution. Live albums with applause require 
special attentionto both editingand PQ coding; 
fading up and down between songs is very discon- 
certing to the listener. I prefer a delicately-edited 
álbum that sounds like a continuous concert. But 
then comes the decisión of where to put the track 
marks, because there are no dead spaces. For track 
beginnings, I keep in mind that the fastest CD 
players take 1 SMPTE frame to cue and the slowest 
about 5 frames, and try to find a track position that 


* Conversely, ihcrc are one or two slow CD players that cue too late, mis&ing the 
downbeat if the track mark is too cióse. 


91 PuttingThe 

Album Together 




Track mark placed very tight to the downbeat with no offset to avoid hearing 
talking which comes before the mark. 


doesn’t reveal the previous noise, or up-cut the 
downbeat of the track. It's anart and a Science, and 

often a compromise 
when a previous noise 
comes very cióse to the 
downbeat, illustrated 
here. 

Hiding Information in 
the Gap 

When a cut from a 
concert álbum is played 
on the radio, it’s often 


desirable to start the tune onthe downbeat, but the 
listener at home wants to hear the atmosphere 
between cuts and the artists’ charismatic 
introductions. To accomplish this dual feat, the 
Creative mastering engineer takes advantage of the 
compact disc’s Index o and Index i time, as in the 
following figure. 


0 

X 

O 

■O 

c 

o 

t * 

£ £ 

1 

forTkIO 

Song (Track 9) Applause Intro to 
Track 10 

Song (Track 10) 


In this example, the song for track 9 ends with 
applause, and the official end of song 9 is at the 
Index o. The time between Index o and Index 1 is 
called the pause orgap time, during which the CD 
player counts backwards to zero, but in this case 
there is sound in the gap. This permits the CD 
player's ramloin play function lo ignore the boring 
or irrelevant parts. Similarly, the introductions. 


count offs, sticks, and so on, for songs on any álbum 
can be placed in the gap so they will not be heard on 
the radio or in random play. Note that by putting the 
speeches into the pause time, they do not increase 
the official length of either track. Uníortunately, the 
most primitive CD players only respect Index 1, so 
the introduction would be treated as the end of the 
previous track, producing some incongruous results 
in random play. Furthermore, many current 
Computer (software-based) CD players and many 
modern-day DVD players also ignore Index c, which 
is destroying a critical part of the artistry of the 
Compact disc.’ To top it off, most DVD players cue 
CDs veryloosely, revealingunintended material. 
Alert your congressman, err, rather. licensors Sony 
and Philips that the CD standard is rapidly eroding, 
hindering the artistry that we have enjoyed for over 
20 years. Regardless, I always PQ code masters 
assuming they will be played on CD players that 
respect the standard; there is little other choice. 

In this vein, it pays to be vigilant for many CDR 
duplicators will mute the pause audio, sometimes 
even taking many seconds OUT and putting just 2 
blank scconds IN (the minimum pause length in the 
CD standard). Imagine your elassie Pink Floyd The 
Wall, which has continuous sound, being gapped by 
accident at the plant. These copiers were found to be 
copying in Track At Once Mode, rather than Disc at 
Once, instead of simply cloningthe disc.' Certainly 
frustrating. 


• The seconddiskof a multi-diskset that has a start id higher than 1 will crash 
many computers, according to Bob Olhsson. in correspondence. 
t Thanks to Dan Stout for this information, as viewed on the excellent Mastering 
Webboard. 


Cliapter 7 92 




















PQ Offsets 

Since CD players can vaiy in their reaction 
times, the editing program can apply typical offsets, 
or show the PQ codes exactly as they will appear on 
thedisc. For example. a start time offset of 13 CD 
frames" means that the actual track mark will be 12 
frames (160 ms) in front of its visual location on the 
screen ifyoti chooseto display the mark withoutthe 
offset. Sophisticated DAWs letyou rehearse the 
effect of cuing with or without the offsets. 

Redbook 1 Limits 

The Redbook specifies the Compact Disc. A CD 
may have up to 99 tracks and each of these tracks 
mav have up to 99 indexes (AKA suhindices). Rarely 
do we code CDs with indexes since many players do 
notsupport them and most people don’t know how 
to use them. Classical engineers used to code each 
major piece with a track mark and the movements 
within via indexes. Rut today most classical CDs 
place a track mark for each succeeding piece. 

The minimum CD track length is 4 seconds. 
Mastering engineers have been known to create a 
hidden track by inserting many short, blank 4- 
second "tracks” at the end of the CD prior to the 
"hidden” one. 

Disc-At-Once, Track-At-Once and 
Standalone CD Recorders 

I would never use a standalone CD recorder to 
make CDRs for replication. There is no provisión 
for Index o, and the location of Index 1 (the track 
mark) can only be as accurate as a manual button 
push. Plus, when recording one track at a time, 
these standalone recorders work in Traek-At-Once 


mode, which puts an E 32 error onto the disc 
wherever the láser stops recording. Computer- 
based machines should be set to work in 
Disc-At-Once mode, which means that the 
CD must be written in one continuous pass. 

PQs and Processor Latency 

Since I like to master on loadout, with all 
processors in line, I have to consider the latency 
(delay) of all the processors, which I have seen up to 
12 SMPTE frames with a full chain including up- 
and down- samplingand the linear-phase 
equalizer, which has a tremendous processor 
latency. The trick is to measnre the delay and slide 
the PQ marks by this amount. 

Hidden Tracks in Pregap 

Some CD players have the ability to rewind in 
front of track one; this is called the pregap or first 
Index o. One company claimed to have the rights to 
putting hidden tracks in that position, but it’s not 
even permitted in the Redbook standard, and 
many plants will not press CDs with a hidden track 
in the pregap. To the best of my knowledge, there is 
no way to produce a DDP with this feature, so only 
CDR masters can be produced in this way if the 
DAW allows it. 


• There are 75 CD Frames in a second, as opposed to SMPTE frames, 3 o per second. 
f The Redbook defines the standards for the audio CD as defined by Sony and 
Philips. 


Putting The 
Album Together 



IV. Editing 

I love the art of editing, because it gives instant 
gratification. There’s nothinglike generatinga 
hundred smiles in a day, rme after eaeh successful 
edit! I thinka whole bookshould be written on 
editing techniques, but ultimately the skill of fine 
editing can only be learned through guided 
experience: the school of hard knocks, and an 
apprenticeship. Agood masteringengineerhas a 
well-developed editing esthetic, which helps us 
tura a rough-hewn work into an audio masterpiece. 

The purpose of this short section is to discuss 
some ofwhat is possible in digital audio editing, and 
what is expected of a good audio master. Using 
sophisticated workstations, we can perform edits 
that were impossible in the days of analog tape and 
the razor blade. I once spent 3 o hours painstakingly 
editing a spoken-word versión of a novel, a task 
which now might be accomplished in a single day. 
SADiE’s playlist-editing mode makes this real easy. 

The Tale of the Head and Tail 

Editing heads and tails is an important skill 
born of experience and musical knowledge. 

Head noise cleanup. Because mechanical 
artifacts can easily distract the listener’s attention 
from the emotional feel and involvement inthe 
music, a mastered work should feel consistent and 
smooth (unless a jarring, jumpy style is intended). 
For example, mastering workstations allowus to edit 
the beginning of a song with a careful fade-up. 
Sometimes this fade-up is made fast (equivalent to a 
90 degree cut), because for some music the 


downbeat is king. But a fast fade-up often sounds 
wrongwith soft music, especially pieces that begin 
with solo vocal or acoustic instruments. A delicate 
acoustic guitar solo can sound abrupt if the noise of 
the ruom and preamp noise is suddenly brought up 
from silence. Unless we perform just the right speed 
and shape of fade-up the air (roomtone) noise will 
cali attention to itself. 

Natural Anticipation. We also have to be aware 
of the important role played by natural anticipation: 
the human breath before the vocal; or the movement 
of the guitarist’s hand before a strum; or the 
movement of the fingere and keys prior to hearing a 
piano downbeat. Often it sounds unnatural to cut off 
these kinds of anticipation; 1 dislike openings of 
songsthat sound choked because the recording 
engineer has cut off the air or space or breath or 
even subtle movements of the musicians. If the 
breath is better included, but sounds a bit loud, then 
a gentle fade-up can produce just the right esthetic. 

I advise mixing engineers not to cut off the tops 
when sending songs for mastering, for the 
mastering engineer probably has better tools to fix 
these, and a quiet, meditative environment to make 
these artistic decisions properly. 60% of the time, 
I'll remove these extra noises, but use the rest to 
good advantage to help the subliminal feel and pace 
of the álbum. 

Tail Noise Cleanup. Sometimes the tail end of a 
song contains noise from musicians or equipment, 
which draws attention to itself by the transition 
from noise to the silence between pieces. The 
simplest and most common solution is called a 


Cliapter 7 94, 


follewfade, which is usually a cosine or S-shaped 
fade to silence. A good mastering engineer may 
spend a minute or more on such a fade to ensure 
thal the tail ambience or reverberation does not feel 
cut off, whilst at the same time, the hiss or noise is 
brought to silence at just the right speed so that it 
isn’t noticed. We can take advantage of the fact that 
hiss and noise are masked by signal of the right 
amplitude, so the follow fade can and should be 
slightly slower than the natural decay. The delicate 
decay of a piano chord at the end of a tune should 
feellike it's ending naturally, even while avoiding 
the thump of the release of the pedal. Some sophis- 
ticated mastering workstations contain reverse S 
curves, allowing us to raise the gam at the tail, after 
hav:ng previously lowered it, in order to hear some 
fine inner detail. 

Fadeouts. I think a good-sounding musical 
fadeout is one that makes us think the music is still 
going on; we’re still tapping our feet even after the 
sound has ceased. Although we can apply the same 
cosine shape we use for tails. fadeouts are a distinct 
art in themselves. Typically, a fadeout will start 
slowly, and then taper off rapidly, mimicking the 
natural hand movement on a fader because most 
people don’t like to sit and listen too long to a fade 
thatlingers. On the other hand, a fadeout should not 
sound like it fell off a cliff, and often in mastering 
we get material that has to be repaired because the 
mix engineer dropped the tail of the fade too fast. 
Since editing is like whittling soap, I recommend 
that mix engineers send unfaded material so it can 
be refined in the mastering. It is difficult to 
satisfactorily repair a fade that was too fast at the 


end; sometimes an S-shape helps, and sometimes 
we can apply a taper on top of the original taper. 

Adding tails. Although editing is like whittling 
soap, sometimes we’re called upon to make more 
soap. And the soap we create can sound more 
authentic than what had to be cut away! If the 
musicians or instruments make a distracting noise 
duringthe ambient decay, the ambience will sound 
cheated or cut off if we perform a follow fade to 
remove the noises. In the figure below is a fadeout, 
to the right of which you can see the noise made by 
the musicians. Unfortunately, these noises occurred 
during the reverberant tail, so the ambience sounds 
cut off. The trick is to feed just the tail of the music 
into a high-quality artificial reverb and capture that 
in the workstation, which you can see in the bottom 
panel. Also notice that the predelay of the reverb 



Adding a tail via a crossfade to artificial reverb. 


95 PuttingThe 

Album Together 





































































postpones its onset. This can be adjusted in the 
mastering DAW's crossfade window which allows us 
to carefully shape, time, and adjust the level of the 
transition to this artificial reverb in a manner that 
can sound completely seamless. Thus we have 
performed the impossible: putting the soap back on 
the sculpture! 

Sometimes an analogtape may have a lot of 
echoey print through or hiss noticeable at the tail of 
the tune. If adding tails with reverb does not work 
well, in this case it is advisable to edit to the digital 
safety versión of the mix, so I advise clients to send 
both versions. 


Adding Room Tone 

Rooni tone is essential between tracks of much 


natural acoustie and classical music. Recording 

engineers should bring samples of room tone to an 

Follow fadeout editing session. Room tone is usually not necessaiy 

lo remova f or pop productions, but if a recording gets very soft 

musicians r rr OD ' 

Decay of noise and you can hear the noise of the room, going 


previous 



Edit within 
the Roomtone 


Médium fast 
fadeup on breath of 
next track and slow 
fadeout of roomtone 


Cditing room tone in an acoustie 
work requires considerable 
artistry An edit must not cali 


Roomtone 


Fadeup on 
Roomtone 


sharply to audio black 
can be disconcerting. 
The ohject is not to 
draw the listener’s 
attention to the 
onslaught or removal 
of noise, as 
illustrated in the 
figure at left. 

Room tone 
should be recorded 
in advance as a 


attention to ttself. 


sepárate "silent 


take” with no musicians in the room. If the room 
tone was not supplied in a sepárate take by the 
mixing engineer (at least 4, preferably 10 seconds or 
more), it is almost impossible forus to manufacture 
a convincing transition and we have to be satisfied 
with a fade to/from silence. In stubbom cases I have 
manufactured a matched room tone by shapingpink 
noise, but it can be a veiy time-consuming (thus 
expensive) process. 

Repairing Bad Edits. One type of bad edit is 
where the reverberation of one take has been cut off 
by the insertion of a new one. This is a elassie error 
caused by the producer instructing the musicians to 
begin the retake exactly at the intended edit point, 
instead of a few bars earlier, a much better practice 
which would not only give the musicians a running 
start, but also generate the reverberant decay of the 
prcccding note for the editor to work with. Because 
the producer did not record the reverberation, the 
ear notices the cutoff of the reverb, which is not 
masked by the transient attack of the next downbeat. 
Luckily, when it comes to mastering, we canrepair 
some of these bad edits even if the original takes are 
not available. The trick is to sepárate the original 
take and the insert at the edit point, use an artificial 
reverb chamber to re-create the missingtailas 
above, then join the edit back together. Since this 
would involve mixing more than two elements, 
sometimes more than one (stereo or surround) 
track is necessary for the brief mix. 

Editing and assembling eoncert albums can be 
a great pleasure. The edited eoncert álbum is the 
perfect example of the principie of willful 


Chapter 7 96 

















suspensión of disbelief because real-life applause is 
almost never as short as 15 or 20 seconds, and real- 
life artists have to stop to tune their instruments. 

The object is to prune the concert down to its 
essence so that the home listener is never bored on 
replay. Editing applause is an art; you have to be 
familiar with the feel of natural applause. Cutting 
applause and ambience between different 
performances exercises the power of the 
workstation’s crossfades. There can never be silence 
between numbers, there must be some degree of 
roomtone (audience ambience). The room tone 
which precedes a quiet number has a very different 
feel than the sound of the audience at the end of a 
loud one, and it is necessary to create an 
imperceptible transition between the two. My 
approach is to do the major cutting on one pair of 
tracks (for stereo), and wherever it needs transi- 
tional help. mix in a bed of compensating ambience 
on another track pair. I once put an audience 
ambience loop under the only studio cut on a live 
álbum, and to this day no one has been able to figure 
out which track is the ringer! 

V. Leveling The Album 

The greater a recording’s dynamic range, the 
harder it is to judge "average level” and you have to 
listen in several spots. I usually start with the 
loudest song on the álbum and find its highest 
point. I then engineer the processing to create the 
impact I'm looking for, hold the monitor at the 
predetermined gain, and make the rest of the songs 
work together at that monitor gain. The rest of the 
álbum falls in line once the loudest song has its 


proper level and impact. Duringthe processing of 
this loudest song, it’s important to ensure the chain 
of processors are in their optimum gain without 
overload: this is the test for the rest of the álbum. 
Tírese days, digital limiters keep from going "over 
level” (distorting the digital system), although a 
limiter pushed too hard produces a squashed and 
unpleasant sound (see Chapters 9-11 on dynamics). 

The ear judges level by comparison to the 
suiTOundings, and adapts lo loud and sofl passages 
by lowering and raising its human gain. Thus, a soft 
beginning may seem too soft following a loud 
climax, but the same level would be fine in the 
context of the middle of a song. And a loud passage 
following a silence seems even louder. That’s why 
you have to pay attention to context when judging 
apparentlevels. Leveling and dynamics processing 
are inseparable, for the output (makeup gain) ofthe 
processors also determines the song’s loudness 
compared to the others (see Chapter 10). A more 
compressed song may sound louder than another 
even if itspeaksdon’t hitfull scale (o dBFS). Iíyou 
change the processing, you have also changed its 
level, so it’s all done by ear. Afterworkingon the 
loudest song and saving the settings, I usually go to 
the first song and work in sequence. Then the 
second song, and next I check the transition 
between the first and second. In a good mastering 
room, this transition will usually work without any 
fine-tuning because we've been monitoring at a 
consistent gain while doing our decision-making. If 
one song appears too loud or soft in context, I make 
a slight adjustment in level until they work together. 
or sometimes increase the spacing to "clear the 


97 PuttingThe 

Album Together 


ear.” If the first song is hot and up-tempo and the 
second begins quietly, it is sometimes necessary to 
tum up the intro of the second song so it will work 
in context. So you can see why it’s important to have 
the álbum in proper order before mastering! 

Extra-soft beginnings, endings or even middle 
spots require special attention. Meter readings 
are fairly useless in this regard; only experience 
will tell us when something is too soft and has to be 
raised. In Chapters 9 thru 11, we’ll get into some 
manual and automatic techniques for altering 
internal dynamic range. 

Ear Fatigue? After leveling and processing the 
last song, 1 always review song numbers one and 
two, to make sure they still fit well into the context. 
There may be a tweak that can further optimize the 
first couple of songs. Or, I might find that the álbum 
has been growing in amplitude due to ear fatigue 
and the latter songs may need to be lowered. 

The Domino Effect 

Overzealous leveling practice (where the 
engineer or producer is trying to make eveiy song 
super-hot) can produce a Domino Effect. Suddenly, 
the song which used to be the loudest, doesn’t sound 
as loud as it did befo re. Ibis is psyehoacoustics a t 
work, or possibly listening fatigue. Not every song 
can be the loudest! If the loudest song was good 
enough before, the problem may be the uninten- 
tional escalation. Instead of trying to push the 
loudest song further, thereby squashing it with the 
limiter, I try to lower the previous songby even a 
few tenths of a dB, which will restore the impact of 
the next song by use of contrast. 


Chapter 7 g 8 


CHclPTer 8 |. Introduction 


Equalization 

Techniques 


Interaction 

Mastering is the art of compromise. It is the art 
of knowing what is somcally possible, and then 
making informed decisions about what is most 
important for the music. The first principie of 
mastering is this: 

Every action affects 
else. This 
principie means 
that we cannot just 
import practices 
from elsewhere 
into the mastering room. Equalization practice is an 
especially clear case of where a technique used in 
mastering is crucially different from an apparently 
similar technique used in mixing. For example, 
when mastering, adjusting the low bass of a stereo 
mix will afíect the perception of the extreme highs. 
Similarly, if a snare drum sounds dull but the vocal 
sounds good, then nine times out of ten, the voice 
will suffer whenyou tiy to equalize for the snare. 1 
These problems occur evenbetween elements in the 
same frequency range: whenyou work on the bass 
drum, for example, the bass guitar will more than 
likely be affected, sometimes for the better, 
sometimes worse. If the bass drum needs EQ but the 
bass instrument is correct, it may be possible with 
careful, selective equalization to "get under the 
bass” at the fundamental of the drum, somewhere 
under 6o IIz. But just as often a bass drum exhibits 
problems in its harmonics, which overlap with the 
range of the bass instrument. A resonance problem 
in the bass instrument may be counteracted by 



'Mastering is the 
art of compromise" 


99 


dipping around 80,90,100 Hz... but this can easily 
affect the low end of the vocal or the piano or the 
guitar. Sometimes we can’t tell if a problem can be 
fixcd until wc try. Wc should ncvcr promisc a clicnt 
miracles—that way they’re dehrious when we can 
deliver them! 


II. What is a Good Tonal Balance? 


Perhaps the prime reason clients come to us is 
to verify and obtain an accurate tonal balance. The 
output of the major masteringstudios is remarkably 
consistent, pointing to their very accurate 
monitoring. While it is possible to help certain 
individual instruments, most of the time our goal is 
to produce a good spectral balance. But exactly what 
is a "good” tonal balance? The ear fancies the 
tonality of a symphony orchestra. On a spectrum 
analyser, the symphony always shows a gradual high 
frequency rolloff, and so will most good pop music 
masters. The amount of this rolloff varíes consid- 
crably dcpcnding on the musical stylc and cvcn the 
moment in the music, so mastering engineers 
rarely* use the spectrum analyser display to make 

EQ judgments. 


'Practice is the best 
of all instructions " 

— Chinese Fortune Cookie 


Eveiything 
starts with the 
midrange. Ifthe 
mid-frequency 
range is lacking in 
a rock recording, 

it’s just like leaving the violas or the woodwinds out 
of the symphony. The fundamentáis of the vocal, 
guitar, piano and other instruments must be 


• We don’t use the spectrum analyser to judge musical balance, but it's useful to 
have around toreveal problems. e.g. identify noises at discrete frcqucncies or 
ultra high or low frequency noise. 


correct, or nothing else can be made right. The 
mastering engineer’s job is to make sure that the 
tonal balance is well within the acceptable range, 
that things don’t stick out inappropriatcly, that the 
sound is pleasant, warm and clear, and is correct for 
the song and the genre. Some pieces of music 
require laid-back cymbals, others are just eiying out 
for an in your face treatment; with the right monitors 
and experience it is possible to know that the EQ is 
just right. 

While we always seek an absolute standard in 
EQ, a recording can have an intentional color, for 
example, a brighter, thinner sound, and the ear will 
"train” itself and learn to accept a slight deviation 
from neutral.* Once the ear has been "trained,” if 
you throw a naturally EQ’d song in the middle of 
this, it will seem fat and muddy by comparison. The 
mastering engineer is there to ensure that the 
deviation from neutral is not excessive because if it 
is then the sound will not transíate adequately on 
the widestvariety of playback systems. We must 
recognize when a sibilant vocal is acceptable, or 
must be eontrolled, for esthetic and technical 
reasons, 3 

Specialized Music Genres 

1 tiy to keep the symphonic tonal balance in my 
head as a basic reference for most rock, pop, jazz, 
world music, and folk music, especially in the mid to 
high frequency balance. This works most of the 
time. But some specialized music genres 
deliberately utilize very different frequency 
balances, and for them the symphony ideal is not 
appropriate. For example, in some styles of music. 


Chapter8 100 




'too much ' (or'too little') bass is just right. You could 
think of Reggae as a symphony with lots more bass 
instruments whereas punk rock is often extremely 
aggressive, thin. loud and bright. Punk voices can be 
thin and tinny over a fat musical background, with 
the natural fundamental-harmonic relationships 
completely strained. When this is done for a whole 
record it can be fatiguing, but it can be interesting 
and musically special when it’s part of the artistic 
variety of the recor d.’ 

Be aware of the intentions of the mix 

Equalization (and other processing) affects 
more than just tonality—it can affect the internal 
balance of a mix. So a good mastering engineer must 
be capable of evaluating the mix intentions of the 
producer/engineer/musicians and be sensitive to 
the needs of the production team. We must not 
unintentionally alter carefully-constructed instru¬ 
mental interrelationships. For example, raisingthe 
bass level to get a warmer tonality will inevitably 
raise the level of, say, the bass insirument compared 
to, say, the vocalist. Sometimes this is exactly what 
the producer intended, because it is possible that 
the lack of warmth will be traced to a monitoring 
issue in the mix environment. and the same issues 
that caused a lack of warmth could also be reducing 
the bass instrument level on an absolute basis. 
Regardless, when I feel that 1 am affecting a balance. 
I always discuss my feelings with the producer to 
make sure that tbe balance "fault” which I perceive 
was not intentional. 


• Yes. there are artistic punk rock records! I believe that the musical integrity of 
the artist determines the worth of a recording. not the style they work in. 


III. Equalization Techniques 

Parametric Equalizers 

There are two basic types of equalizers — 
parametric and shelving— named for the shape of 
their characteristic curve. Parametric EQ is 
favoured in recording and mixing. Invented by 
George Massenburg circa 1967 4 , the parametric is 
the most flexible curve, providing three Controls: 
center frequeney, handwidth, and level of boost or 
cut. Mix engineers like to use parametrics on 
individual instruments, either boosting to bring out 
their clarity or salient characteristic, or selectively 
dipping to elimínate problems, or by virtue of the 
dip, to exaggerate the other ranges. The parametric 
is also the most popular equalizer in mastering since 
it can be used surgically to remove certain defeets, 
such as overly-resonant bass instruments. A 
simpler (non-parametric) equalizer has fixed 
frequeney and handwidth and only the level is 
adjustable per band. 

Q’s and Bandwidth 

Equalizer Q is defined mathematically as the 
product of the center frequeney divided by the 
bandwidth in Hertz at the 3 dB down (up) points 
measured from the peak (dip) of the curve. Alow Q 
means a high bandwidth, and vice versa. The first 
figure on the next page shows two parametric 
equalizers with extreme levels for purposes of 
illustration: On the left, a 17 dB cut at 50 Hz with a 
very narrow Q of 4, which is o .36 octaves. The 
bandwidth is 12-5 Hz. On the right, a 17 dB boost 
centered at 2 kHz, with a fairly wide (gentle) Q of 
0.86, which is 1.6 octaves. The bandwidth is 2325 


101 


Equalization 




Parametric equalizer with *17 di 
boost centered at 2 kHz with a 
fairiy wide bandwidth of 1.60 oct 
(Q = 0.86), indicated by the 
dasned white Une at the 3 dB 
down points. A cut of-17 dB at 50 
Hz with a very narrow bandwidth 
ofO 36 octaves (Q - 4). 


Hz, represented by the dashed white 
line.* 

The choice of high or low Q 
depends on the situation. Gentle 
equalizer slopes almost always sound 
more natural than sharp ones, so Q’s 
of o.6 and 0.7 are therefore very 
popular. Use the higher (sharper) Q’s 
(greater than 2) whenyou need to be surgical, such 
as dealing with narrow band resonances or 
discrete-frequency noises. It is possible to work on 
just one note with a sufficiently narrow-band 
equalizer. I also use higher Q’s when I want to 
emphasize an instrument with minimal effect on 
another instrument. For example, a poorly-mixed 
program may have a veiy weak bass instrument; 
boosting the bass circa 80 Hz may help the bass 
instrument but muddy the vocal, in which case I 
narrow the bandwidth of the bass boost until it stops 
affectingthe vocal. The classic technique for finding 
a resonance is to focus the equalizer: start with a 
large boost (instead of a cut) to exaggerate the 
unwanted resonance, and fairiy wide (low valué) Q, 
then sweep through the frequencies until the 
resonance is most exaggerated, then narrow the Q to 
be surgical, and finally, dip the EQ the amount 
desired. 


Shelving Equalizers 

Ashelving equalizer affects the level of the 
entire low frequency or high frequency range below 
or above a specified frequency. For example, a 1.5 
kHz high shelf affects all the frequencies above 1.5 
kHz. In mastering, shelving equalizers take on an 

Many equalizers define bandwidth in octaves instead of Q. Appcndix 6 contains 
a convenient table for converting betwcrn Q and bandwidth. 


Chapter8 ioí 


increased role, because we’re dealing with overall 
program material. One interesting variant on the 
standard shelf shape can be found in the Waves 
Renaissance EQ and Manley’s Massive Passive, very 
useful mastering equalizers. This resonant shelf is 
based on research from psychoacoustician Michael 
Gerzon, who believed it to be a very desirable shape. 

I like to think of it as a combination of a shelving 
boost and a parametric dip (or vice versa). In the top 
figure, a lowQ (0.71) bass shelf of 11.7 dB below 178 
Hz is mollified by a gentle parametric dip above 178 
Hz, all controlled by a single band of the equalizer. 
This is an extreme boost for illustration, but this 
type of curve can be useful to keep a vocal from 
sounding thick while implementing a bass boost. 

Top: Gerzon 
resonant shelf 
with a low Q. 
bottom: Ihe 
same with a 
high p. The dip 
justpastthe 
shelving boost 
frequency is 
charccteristic 
of the Gerzon 
resonant shelf. 



The bottom figure shows the same boost with a high 
Q of 1.41. 

Shelving equalizers can have low or high Q, with 
Q defined as the slope of the shelf at its 3 dB up or 
down point. 

Using Baxandall for air 

As I mentioned in Chapter 3 , the air band is the 
range of frequencies between about 15-20 kHz, the 










highest frequencies we can hear. An accurate 
monitoring system will indícate whether these 
frequencies need help. An air boost is 
contraindicated if it makes the sound harsh or 
unintentionally brings instruments like the cymbals 
forward in the depth picture. Very few people know 
of a third and important curve that’s extremely 
useful in mastering: the Baxandall curve, named 
after Peter Baxandall (pictured at right). Hi-Fi tone 
Controls are usually modelled around the Baxandall 
curve. Like shelving equalizers, a Baxandall curve is 
applied to low or high frequency boost/cuts. Instead 
of reaching a plateau (shelf), the Baxandall 
continúes to rise (or dip, if cuttinginstead of 
boosting). Think ofthe spread wings of a butterfly, 
but with a gentle curve applied. You can simúlate a 
Baxandall high frequency boost by placing a 
parametric equalizer (Q= approximately i) at the 
high - frequency limit (approximately 20 kHz). The 
portion ofthe bell curve above 20 k is ignored, and 
the result is a gradual rise starting at about 10 k and 
reaching its extreme at 20 k (see fig). This shape 
often corresponds better to the ear’s desires than 
any standard shelf and a Baxandall high frequency 
boost makes a great aireq. 

Be careful when making high frequency boosts 
(adding sparklies). They are initially seductive, but 
can easily become fatiguing. In addition, the ear 
often treats a high frequency boost as a thinning of 
the lower midrange, which completely changes 
intended program balance or the mix that was 
intended. The highs come up, but for example, the 
cymbals, trianglc and tambourinc also becomc 
louder. Is this consonant with the musical intent? In 












« 1 . — .1 .L. 

i. 






i't'i'V 





_____i.. 



WMm 

Ü 

li 

3 

9 

'(T 1 


■ ■■MKMJUk 

IA 





■xpi' ’jr i 

Iff ^ » ' i 

L 
































?,l" ' ' ' L ' 

1 1 ' i 1 1 * ' 1 


■HW 


Gentle Baxandall curve (plnk) vs. Sharp Q shelf (black). Many shelving equalizers ha ve gentler curves 
and may approach the shape of the Baxandal!. Try a shelf with 3 dB per octave slope for this purpose. 


accordance with the first principie of mastering, you 
musí pay attention to the instrumental and vocal 
balance as well as the tonal balance whenever 
making changes in any EQ range. 

High-Pass and Low-Pass Filters 

On the left of the figure on the next page is a 
sharp high-pass (low cut) filter at 61 Hz, and on the 
right, a gentle low-pass (high cut) filter at 3364 Hz. 
The frequencies are defined as the points where the 
filter is 3 dB down. High-pass and low-pass filters 
are used to solve noise problems in mastering but 
they can make their own problems as we shall soon 
see. They’re hard to use surgically because they 
affect everything above or below a certain 
frequency. High-pass filters are used to reduce 
rumble, thumps, p-pops and other noises. Low- 
pass filters are sometimes used to reduce hiss, 
though since the ear is most sensitive to hiss in the 3 
kHz range, a parametric dip rnay be more surgical 
than the radical pass-filter solution. I rarely apply a 


io 3 Equalization 





























At left: Sharp 
high-pass filter 
at 61 Hi. At 
right: Gentle low 
pass filter at 
3364 Hz. 

standard filter to reduce hiss except for 
short passages, preferring specialized noise- 
reduction Solutions instead (see Chapter 12). 



EQ yin and yang 

Remember the yin and the yang: Contrasting ranges 

have an interactive effect. For example... 

• A slight dip in the lower midrange (-250 Hz) can 
have a similar effect to a boost in the presence 
range (*-5 kHz). 

• Adding bass will make the highs seem duller and 
reducingbass will make the sound seem brighter. 

• Adding extreme highs between 15 - 20 kHz will 
make the sound seem thinner in the bass/lower 
midrange. 

• Warming up a vocal will reduce its presence. 


"Remember the yin and the yang: 
Contrasting ranges have an 
interactive effect ” 


Yin and yang 
considerations 
imply thatyou are 
likely to be 
working in two 
contrasting ranges 


at once to assure 

that the sound is both warm and clear. Harness the 
yin and yang when the level is too high—pick the 
frequency band which you can reduce in level. 
Harshness in the upper midrange/lower highs can 
be combated in several ways. For example, a harsh- 
sounding trumpet-section can be improved by 


dippingaround 6-8 kHz, and/or by boostingcirca 
250 Hz. Either way produces a warmer (sweeter) 
presentation, andyour choice of which frequency 
range to work on will be influenced partly by what 
other instruments are playing at the same time as 
the trumpets. The next trick is how to restore the 
sense of air which can be lost by even a 1/2 dB cut at 
7 kHz, and this can often be accomplished by raising 
the 15 to 20 kHz range, often only 1/4 dB can do the 
trick. 5 Never forget the first principie: it’s easy to 
fall into the trap of concentrating on one element 
while forgetting how it is affectingthe rest. 

One channel or both (all)? 

Most times makingthe same EQ adjustment in 
both (all) channels is the best way to proceed as it 
maintains the stereo (surround) balance and the 
relative phase between channels. But sometimes it 
is essential to be able to alter only one channel’s EQ. 
For example, with a too-bright high-hat on the right 
side, a good vocal in the middle and proper crash 
cymbal on the left. the best solution is to work on 
the right channel's high frequencies. 

Start subtly first 

Sometimes important instruments need help, 
though, ideally, they should have been fixed in the 
mix. The best repair approach is to start subtly and 
advance to severity only if subtlety doesn’t work. For 
example, if the piano solo is weak, we try to make 
the changes surgically: 

• only during the solo 

• only on the channel where the piano is primarily 
located, ifthat sounds less obtrusivc 

• only in the frequency range(s) that help, 
fundamental, harmonic, or both 


Chapter 8 104 



• only as a last resort by raising the entire level, 
because a keen ear may notice a change when the 
gain is brought up 

Realize the limitations of the recording 

There is only so much that can be accomplished 
in the mastering and waiting until the mastering 
stage to fix certain problems usually produces 
compromise. There is little we can do to fix a 
recording where one instrument or voice requires 
one type of equalization and the rest requires 
another.' For example, rolling off the low end to 
correct a heavy synth bass is sure to lose the punch 
of the bass drum. Or brightening a vocal can make 
the tambourine sound fatiguing. In these cases I 
often recommend a remix. If a remix is not possible, 
then we resort to specialized techniques such as M/S 
equalization or multiband dynamics (compression/ 
expansión) to bring out a weak instrument or hide 
another, which can produce fabulous results, 
sometimes indistinguishable from a remix (we 
explain M/S in Chapter i 3 ). But the better the mix 
we get, the better the master we can make, which 
implies that a perfect mix needs no mastering at all! 
Even so, it is worth the time to get the approval of an 
eiperienced mastering engineer working in a 
neutral monitoring environment, even if she 
decides that no mastering or polishing is needed. 

Instant A/B’s? 

With good monitoring, equalization changes of 
less than 1/2 dB are audible. I believe that instant 


• Bemie Grundman calis this a recording which is "not uniform," as quoted in The 
Masiermg Engineer'$ HandbooJs (see Appcndix \0). 
t This is a fundamental partof the see saw argumenta for and against blind 
testing methods. something which we will not cuver in this book. 


A/B comparisons deceivingly hide the factthat a 
subtle change has been made, as the change will 
only be noticed 
over time.' Iwill 
take an equalizer in 
and out to confirm 
initial settings, but 
I never make 
instant EQ 

judgments. Music is so fluid from moment to 
moment that changes in the music will be confused 
with EQ changes. I usually play a passage for a 
reasonable time with setting "A" (sometimes 3 o 
seconds, sometimes several minutes), then play it 
again with setting "B." Or, I play a continuous 
passage, listeningto ”A” for a reasonable time 
before switching to "B.” For example, over time it 
will become clear whether a subtle high frequeney 
boost is helping or hurtingthe music. 

Fundamental or Harmonio? 

The extreme treble range mostly contains 
instrumental harmonics. Surprisingly, the 
fundamental of some crash cymbals can be as low as 
1.5 kHz or below. When equalizingor processing bass 
frequencies, it is easy to confuse the fundamental 
with the second harmonio. The detail shot of a 
SpectraFoo™ Spectragram in Color Píate Figure C 8-01 
illustrates the importance of the harmonics of a bass 
instrument. High amplitudes are indicated in red, 
descending levels in orange, yellow, green, then 
blue. 

Notice the parallel run of the bass instrument’s 
fundamental from 62-135 Hz and its second and 
third harmonics from 135-250 ajl d U p. Should we 


I "The perfect mix may need 

| no mastering at all! 


>°5 Equalization 



equalize the bass instrumenté fundamental or the 
harmonio? It’s easy to be fooled by the octave 
relationship; the answer has to be determined by 
ear—sometimes one, the other or both. To find out 
which is most important, I use the focusing 
technique, sweepingthe equalizer from the 
fundamental to the harmonio. But in mastering we 
may not have the liberty of choice, since the 
equalizer may simultaneously affect the bass 
instrument, bass drum, and the low end of the 
piano, guitar, vocals, etc. It might be necessary to 
choose the frequency which has the least effect on 
other instruments rather than the ideal one for the 
focal instrument. It’s also a matter of feel; in a 
rhythm piece, we can forgo delicacy and make it kick 
with a general bass boost.* 

Bass boosts can create serious problems 

Since the ear is significantly less sensitive to 
bass energy, bass information eats up lots more 
power (6 to 10 dB) for equal sonic impaet below 
about 50 Hz, and requires about 3-5 dB more 
between 50 and roo Hz. 6 This means that our low 
frequency equalization practice may use up so much 
energy that it affeets the loudest clean level we can 
give to a song. It also explains why bass instruments 
often have to be compressed to sound even. 
Historically, the high pass filter was our best friend 
when we made LPs, to prevent excess groove 
excursión and obtain more time per LP side. Digital 
media do not have this physical problem, but the 
psychoaeoustic problem of the ear’s low frequency 
insensitivity still exists. 

One possible way to save "energy” is to use a 
fairly sharp high pass (low cut) filter somewhere 


below 40 Hz, which does not significantly affect the 
energy of the bass drum or the low notes of the bass. 
I do not make this decisión lightly as many 
recordings sound better Hat; the monitor system’s 
woofers must have calibrated, extended response 
for this judgment. The high pass filter must be 
extremely transparent and have low distortion. 
During mastering, I listen carefully, switching a 
filter in and out to determine if it is helping or 
hurting. Sometimes a gentle filter is a better choice 
than a steep one, as when dealing with a boomy bass 
drum or bass. But subsonic energy, rumble or 
thumps require a steep filter to have minimal effect 
on the instruments. When "uncoloring” a 
reso nance, a fairly narrow parametric filter tuned to 
the offending frequency is also a good choice. 

Mix engineers working with limited bandwidth 
monitors run the risk of producing an inferior 
product. Subwoofers permityou to hear low 
frequency leakage problems that tend to muddy up 
the mix, for example, bass drum leakage in vocal 
and piano mikes. It’s much better to apply selective 
high pass filtering during the mixing process 
because mastering filters will affect all the 
instruments in a frequency range. For example, mix 
engineers can usually get away with a steep 80 Hz 
filter on an isolated vocal, but it’s extremely rare to 
see a mastering engineer use one on a whole mix. A 
mixing engineer should form an alliance with a 
mastering engineer, who can review her first mix 
and alert her to potential problems before they get 
to the mastering stage. 

• If that's wha: the piecc needs. I shudder :o think that readers may take each 
recommendation in this chapterliterally, and apply it to theirwork. Mastenng 
engineers do not automatically equalize: we always listen and evalúate first. 
Many piece» leave tnasleriug with nu equalúaliun at all. 


Chapter 8 106 



IV. Other refinements 

Linear-phase Equalizers 

All current analog equalizer designs and nearly 
all current digital equalizers produce phase shift 
when boosted or cut; that is, signal delay varíes with 
frequency and the length of the delay changes with 
the amount of boost or cut. Hi-Q filters produce the 
most phase shift. This kind of filter will always alter 
the musical timingand wave shape, also known as 
phase distortion. Daniel Weiss says, 

[In contrast] a particular type of 
digital filter, called the Symmetric FIR 
Filter, is ¡nherently linear-phase. 7 This 
means that the delay induced by 
Processing is constant across the 
whole spectrum, unconstrained by 
eq settings.* 

Since FIR filters are expensive to implement in 
real time, linear-phase equalizers have only 
recently appeared. Rather than FIR filters, the Weiss 
uses a complementary IIRtechnique to obtain 
linear-phase. This technique seems to avoid one of 
the downsides of the FIR approach, which can 
produce weird results at certain frequencies unless 
they use extreme computingpower (MIPS). 

John Watkinson believes that much of the 
audible difference between EQs comes down to the 
phase response. + I don't think engineers have a 
gaod handle on the sonic deteriorations of phase- 
shift in equalizers; after my first linear-phase 
experience, it was hard to go back. To my ears, the 
linear-phase sounds more analog-like than even 


analog! The Weiss has a veiy puré tone and seems to 
boost and cut frequencies without introducing 
obvious artifaets. Ironically, while masteringa punk 
rock recording, it proved too sweet in linear-phase 
mode so I had to return to normal mode to give the 
sound some grunge. So clearly much of the qualities 
we’ve grown accustomed to in standard equalizers 
must be due to their phase shift. 

Most times I choose linear-phase mode. But 
both filter designs have their Achilles’ heels. 

Whenever you have to equalize, 
you will alter the signal in both the 
time and frequency domains (as 
mathematics requires); there will 
always be a time artifact. In the 
analog style equalizer, which is usually 
mathematically termed minimum- 
phase, the alteration will be primarily 
to spread the signal downstream, 
i.e. does not leadthe original signal by 
much, ¡f any. A downstream modifi¬ 
caron translates into different delays 
at different frequencies dispersing the 
original signal. In some cases this 
effect is quite audible. If one uses a 
digital approach, one can either mimic 
the analog behavior, or use a linear- 
phase, aka constant delay filter. This 
filter will equally precede and follow 
the signal; part of the filter may create 
a pre-echo effect, modifying the 


* Described by Daniel Weiss at the Weiss website. http://www.weiss.ch. 
f Studio Sound Magazine. 9/97. 


107 Equalization 



Ieading edge of transients and signal 
changes. A high Q linear-phase filter 
can introduce audible pre-echo in the 
short millisecond range; it’s exactly 
like a floor bounce but without the 
comb-filtering. Any time that a high 
Q filter is used, careful listening with 
both types of equalization may be 
necessary to decide which choice 
is best . 1 2 3 4 5 6 7 8 

Neither approach is fundamentally better. The 
mínimum phase (analog-style) equalizer tends to 
smear the depth and imaging, and occasionally that 
artificial smearing produces a pleasantly vague 
image. The linear phase equalizer can subtly 
deteriórate transient response. It might be a good 
idea for manufacturers to allow us to select filter 
types per band; I might choose mínimum-phase for 
a steep high pass, and linear phase for a gentle 
presence hoost. 

Dynamic Equalization 

Multiband dynamics processing can also be 
treated as dynamic equalization, where the time 
constants or thresholds have little effect on the 
actual dynamics but rather more on the tonal 
balance at different amplitudes. Dynamic equalizers 
emphasize or cut low, mid or high frequencies 
selectively at either low levels or at high levels. 

These can be used as noise or hiss gates, rumble 
filters that only work at low levels (especially useful 
for traffic control in a delicate classical piece), 
sibilance controllers, or ambience enhancers. They 
can cnhancc inncr dctails of high or low frcqucncics 


Chapter 8 108 


at low levels. where details are often lost. They can 
be used to reduce harshness. enhance clarity at high 
levels or for other purposes, as described in detail in 
Chapter to. 


1 We‘re always seeking tcchniqucs (beyond simple equalization) to isolate one 
inst rumcnt from another. and it is posible to greatly improve the impact and 
clarity of the snare and otherpercussicn instrumenta without changingthe 
tonality of the vocal, using upward expansión with just the nght atUck and 
reléase times. It's frequently possible to enhance or punch a bass drum without 
significantlv affcctingthc bass instrument, by using selective-frequency 
dynamics processing. And so on. See Chapters 10-11. 

2 We all believe we have "the absolute sound" in our heads. but are surpnsed to 
leam how much tonal variance is tolerable as the ear/brain accomodates. 
Similarly. the eye accustoms itself to varying color temperatures, which only 
cali attention to themselves when they change. A good photographer can 
usually identify F.ktachrome from Kodachrome. but both look good on their 
own, and their color difference primarily shows up whenyou place two slides 
side by side. 

3 Technically. sihilancc can WTeak havoc with the high frequeney limiters in FM 
radio which are there to handle a preemphasis hoost. An over sibilant vocal can 
cause the radio limitersto clamp downand lose definition, in extreme, the 
sound will bounce and words will be lost at the rate of the radio limiter's 
recovery time. Thus, overly bright records can sound dull on the air; brightness 
is self-dcfcatingwhen it comes to radio processing. 

4 In 1967. ycung George Massenburg began the search for a Circuit which would 
be able to independently adjust an equilizer's gain. bandwidth and frequeney. 
The key word is independent, for most analog circuits fail in this regard and the 
frequeney. Q. and gain Controls interact with each otheT. He called *.his Circuit a 
parametric equalizer and his Circuit remains proprietary today. 

5 Moving coil cartridges sometimes have a dip in the 8 kllz range and a rise from 
10 to 20 kHz, which gives them a su-m tound. amounting to a tone oontrol in 
the reproduction system. I prefer my reproduction system to be neutral and to 
corrcct problems in the program material itself. But since a lot of oldcr 
program nial erial was equalized on lower resolut ion monitor systens, it makes 
sense to have a tone control in your home playback system. 

6 This is dictatcd by the psychoacoustic equal loudness curves, first researched by 
Fletcher. Harvey and Munson in the 1930's. 

7 FIRstandsfor Finite Inpulse Response. and UR for Infinite Impulse 
Response. Readers interested in a detailed theoretical explanation of the 
difference bctwcen F 1 R and 1 IR filters should invest a little time in lohn 
Watkinson'sThe Art of Digital Audio. 

8 ]im lohnston. in correspondence. 



Chapter 9 

How To 
Manipúlate 
Dynamic 
Range for Fun 
and Profit 

PART ONE: 
Macrodynamics 


I. The Art of Dynamic Range 

Dynamic Range is defined as the ratio between 
the loudest and softest passages of the body of the 
music-, henee it should not be confused with 
loudness or absolute level-, the term dynamic range 
is only concerned with difíerences. For popular 
music, this is typically only 6 to 10 dB, but for some 
musical forms it can be as little as a single dB or as 
great as 15 (very rare). Iniypicalpop music, soft 
passages 8 to 15 dB below the highest level are 
effective only for brief periods, but in classical, jazz 
and many other acoustic forms, soft passages can 
last several minutes. 

Microdynamics and Macrodynamics 

The art of manipulating dynamics may be 
divided into Macrodynamics and Microdynamics. I 
cali music’s rhythmic expression, integrity or 
bounce, the microdynamics of the music. I cali 
macrodynamics the loudness difíerences between 
sections of a song or song-eyele. Usually dynamics 
processors (such as compressors, expanders) are 
best for microdynamic manipula tion, and manual 
gain riding is best for macrodynamic manipulation. 
The micro- and macro- workhand in hand. and 
many good compositions incorpórate both 
microdynamic changes (e.g. percussive hits or 
instantaneous changes) as well as macrodynamic 
(e.g., crescendos and decrescendos). Ifyou think of 
a music álbum as a full-course meal, then the 
progression from soup to appetizer to main course 
and dessert is the macrodynamics. The spicy impact 



MyTH: 

"Of course l’ve got 
dynamic range. 

I’m playing as 
loudly as I can!" ‘ 

l__I 


• Acommon misconccption. Thanks to Gordon Reíd of Cedar for contributing 
this audio rayth. 


ÍO9 






of each morsel, is the microdynamics. In this 
chapter we concéntrate on macrodynamics. 

Dynamics in Musical History 

Dynamic changes became very important to 
western music sometime between the medieval 
Gregorian chants and the classical period, when 
composer Franz Josef Haydn surprised us with 
perhaps the first example oí simultaneous micro- 
and macrodynamics.' Since ancient times, many 
"non-western” styles, such as African, Afro- 
Caribbean. Eastern, Indian, Balinese and other 
Oriental music forms, have stressed rhythm 
(microdynamics, especiallyin the form of 
percussion) as much as melody, and in the twentieth 
century of integration, heavy percussive rhythm 
became extremely important to western musical 
forms as well. 2 

Any gcnre that docs not grow in musicality will 
quickly die, and dynamic contrast plays a big 
musical role. Today’s Rap and Hip-hop music has 
taken a 250-year-old lesson from classical 

composition, by 
beginning to 
incorpórate a 
melodic and 
harmonio 
structure. The 
genre can íurther 
grow and avoid soundingtiresome by expandingits 
dynamic range, adding surprises. Silence and low 
level material creates suspense that makes the loud 
parts sound even more exciting. Five big 
firecrackers in a row just don’t sound as exciting as 
four little cherry bombs followed by an M80. Radio, 


The soundtrack for the movie The 
Fugitive is mixed like a relentless, 
fatiguing music single. Titanio was 
mixed like a beautiful record álbum. 


TV and Internet distribution are currently too 
compressed to transmit the joy of wide dynamic 
range, but it sure turas people on at home, and also 
in the motion picture theater. 

Films provide an ideal framework to study the 
Creative use of dynamic range. The public is usually 
not consciously aware of the effect of sound, but it 
can play a role in a films success. I thinkthe movie 
The Fugitive succeeded because of its drama, but 
despite an aggressive, compressed, fatiguing sound 
mix. From the beginning bus ride, with its super- 
hot dialog and effects, all the crashes were 
constantly loud and overstated, completely 
destroying the impact of the bigtrain crash. I can 
hear the director shouting, "more more more” to 
the mix engineers. Haven't they heard of the term 
suspense? Because when everything is loud, then 
really, nothing is loud. In contrast, the sound mix oí 
'97’s biggest movie. Titanio, is a masterpiece of 
natural dynamic range. The dialog and effects at the 
beginning of the movie are played at natural levels, 
truly enhancing the beauty, drama and suspense for 
the big thrills at the end. Kudos to director James 
Cameron and the Skywalker Sound mix team for 
their restraint and incredible use of dynamic range. 
That’s where the excitement lies for me. 

Life I mitotes Art? 

Glearly, modera recording techniques and 
equipment have aided in the creation of whole new 
musical styles, for example, hip hop, which uses 
digital editing and processing to create the beats of 
the music in a highly compressed, often low- 
dynamic-range style. 3 This is basically an extensión 
of a trend in popular music that began many years 


Chapter 9 


110 


ago with the invention of electric instruments and 
amplifiers, and has accelerated exponentiaUy with 
modern record i ng techniques and powerfui digital 
processors. Successive styles have incorporated less 
and less dynamic range, both macrodynamics and 
microdynamics. Going hand in hand with this trend 
is an exponential increase in distortion from style to 
style and year to year. This may very well be due to a 
vicious circle that is centered in the mastering 
engineer’s hands, for inevitably, most masters tend 
to be more compressed than the sources 4 —and what 
sources do recording engineers listen to for 
inspiration? Mastered records! We may have bred 
the very disease which we seek to eliminate! 

While I find the current high-distortion trend 
very fatiguing and unlistenable after short periods 
of time, we must remember that one man’s meat is 
another man’s poison—never more trae in the case 
of popular music. Musical and sound styles have 
been created out of the veiy results of pushing 
digital compressors beyond their usual settings, for 
example, sound qualities such as squashing and 
shred. Which is why the successful mastering 
engineer must be familiar with and enjoy listening 
to many musical styles and sounds, including 
perhaps those sound qualities that would not 
normally be considered deán by practicing 
engineers. I simply hope that the cycle has reached 
its peak, since there’s nowhere to go but back down, 
when music has dynamic range of 3 dB and 
distortion that tears the hair out of one’s ears. In 
due time, these new styles will become assimilated 
into the larger musical vocabulary, and we can hope 
that decent and exciting dynamics will return as a 
rule rather than the exception. 


The Art of Decreasing Dynamic Range 

The dynamics of a song or song cycle are critical 
to Creative musicians and composers. As engineers, 
our internal sound quality reference should be the 
sound quality of a live performance; we should be 
able to tell by listening if a recording will be helped 
or hurt by modifying its dynamics. Many recordings 
have already gone through several stages of 
transient-destroying degradation, and 
indiscriminate or further dynamic reduction can 
easily take the clarity and the quality downhill. 
However, usually the recording médium and 
intended listening environment simply cannot keep 
up with the full dynamic range of real Ufe, so the 
mastering engineer is often called upon to raise the 
level of soft passages, and/or to reduce loud 
passages, which is a form of manual compression. 5 
We may reduce dynamic range (compress) when the 
original range is too large for the typical home 
environment, or to help make the mix sound more 
exciting, fatter, more coherent, to bring out inner 
details, or to even out dynamic changes within a 
song if they sound excessive. 6 

Experience tells us when a passage is too soft. 
The context of the soft passage also determines 
whether it has to be raised. For example, a soft 
introduction immediately after a loud song may 
have to be raised, but a similar soft passage in the 
middle of a piece may be just fine. This is because 
the ears self-adjust their sensitivity over a médium 
time period, and may not be prepared for an instan- 
taneous soft level after a very loud one. Thus, meter 
readings are íairly useless in this regard. How soft is 
too soft? The engineers at Lucasfilm discovered that 


111 Dynamics: Part One 


having a calibrated monitor gain and a dubbing 
stage with NC- 3 o* noise floor do not guarantee that 
a film mix will transíate to the theatre. During 
theatre test screenings, some veiy delicate dialogue 
scenes were "eaten up” by the air 
conditioning rumble and audience 
noise in a real theatre. So they created a 
specially-calibrated noise generator, 
added to the mixingstudio’s monitor 
system. labeled "popcorn noise,” which 
could be switched on whenever they 
wanted to check a particulariy soft 
passage. For similar purposes, the 
"typical” (altérnate) listening room we 
have at Digital Domain has a ceilingfan 
and other noisemakers. Whenever I 
have a concern, I start the DAW playing a loud 
passage just before the soft one, and take a walk to 
the noisy listening room. If the soft passage seems a 
bit too soft in comparison to the loud one, it will be 
obvious inthere. 

The Art of Increasing Dynamic Range... 

.. .can also make a song sound more exciting, by 
using the art of contrast or by increasing the 
intensity of a peak, for much of the impact of a song 
comes from its internal dynamics and transients. 
The trick is to recognize when an enhancement has 
become a defect—musical interest can be enhanced 
by variety, but too much variety is just as bad as too 
much similarity. Musical taste, experience and a 
great monitor system are required to make these 
judgments. Increasing dynamic range is known as 
expansión. Another reason to expand is to restore, 
or attempt to restore the excitement of dynamics 


which had been lost due to múltiple generations of 
compression or tape saturation; in this case we are 
increasing the recorded range. 

The Four Varieties of Dynamic Range Modification 


We always use the term Compression for the 
reduction of dynamic range and Expansión for its 
increase. There are two varieties of each: upward 
compression, downward compression, upward 
expansión, and downward expansión, as 
illustrated in the above figure. 

Downward compression is the most popular 
íorm of dynamic modification, taking high level 
passages and bringing them down. Limiting is a 
special case—downward compression with a very 
high ratio (to be explainedin Chapter 10). Examples 
inelude just about every compressor or limiter you 
have ever used. For clarity in this book, we will 
always use the short term compressor to mean 
downward compressor unless we need to 
distinguish it from upward compressor. 

Upward compression takes low level passages 
and brings them up. Examples inelude the encode 


Compression Expansión 

Downward Upward 


Loud 

^Compression 

Expansión^ 


Original 

Dynamic 

Range 


New 

Dynamic 

Range 


Original 

Dynamic 

Range 


New 

Dynamic 

Range 




Soft 

"^Upward 


Downward''* 



Compression Expansión 

Any combinaron of these four processes may be employed in a masteríng session 


* Aroom withan NC-3o ratingisveiy quiet. 






















side of a Dolby® or other noise reduction system, 
the AGC" which radio stations use to make soft 
things louder, and The type of corapressor 
frequently used in inexpensive video cameras and 
consumer VCRs. In Chapter 11 we will introduce you 
to a powerful upward compression technique that is 
extremely transparent to the ear. 

Upward expansión takes high level passages 
and brings them up even further. Upward expanders 
are veiy rare and veiy precious, for in skilled hands 
they can be used to enhance dynamics, increase 
musical excitement, or restore lost dynamics. 
Examples inelude the peak restoration process in 
the playback side of a Dolby SR, the DBX Quantum 
Processor, the various Waves brand dynamics 
processors, and the Weiss DS1-MK2 when used with 
ratios less than 1:1 (to be explained). 

Downward expansión is the most common type 
of expansión.- it takes low level passages and brings 
them down further. Most downward expanders are 
used to reduce noise, hiss, or leakage. A dedicated 
noise gate is a special case—downward expansión 
with a very high ratio (to be explained). Examples of 
downward expanders inelude the elassie Kepex and 
Drawmer gates, Dolby and similar noise reduction 
Systems in playback mode, expander functions in 
inulti-function boxes (e.g., Finalizer), and the gates 
on recording consoles. For elarity in this book, we 
will use the simple term expander to mean the 
downward type unless we need to distinguish it from 
the upward type. 


II. The Art of Manual Gain-Riding: 
Macrodynamic Manipulation 

In General 

Level changes need to be made in the most 
musical way. To this end, internal level changes are 
least intrusive when performed manually (by raising 
or loweringthe fader), as little as a 1/4 dB at a time, 
as opposed to using processors such as compressors 
or expanders, which tend to be more aggressive. 

When gain riding, rock the boat the right way¡ 
try to go with the waves, don’t fight them. If the 
musicians are trying for upward impact, pulling the 
fader back during a crescendo can be devastating 
since taking the fader down during a peak 
diminishes the intended impact. Ifyou’re doinga 
live recording and you sense the musicians are 
going to overload the recorder, you're already too 
late. The best case scenario is to use your sixth sense 
as early as possible. and lower the fader as slowly as 
possible, and only enough to fix the anticipated 
problem. An experienced live recording engineer 
will log where she made such changes, so that the 
original dynamic range may be restored by 
reversingthe moves in post-production. Another 
trick is to measure peak levels during rehearsal, and 
assume the concert will have a peak at least 3 dB 
hotter! Having calibrated íaders makes that 
adjustment easier. The art of manual leveling can 
really improve a production. We can enhance a great 
rock or pop mix during mastering, first by 
discoveringany inappropriate level changes that the 
mix engineer may have missed, and by reversing 
them we can restore or enhance where the music is 


1 ¡3 Dynamics: Part One 


tryingto go. I’ve heard manya rockpiece where the 
climax was emasculated because the mix engineer 
kept on droppingthe masterfader to keep from 
overloading. In mastering we can correct for this 
unintentional error with delicate changes; it's 
amazing what a dB here or there can accomplish. It's 
also our responsibility to check with the client in 
case their level change was intentional! A great rock 
and roll mix is extremely rare; during mixing it’s 
really hard to simultaneously pay attention to the 
internal balances as well as the dynamic movement 
of the music between, for example, verse and 
choras. A sensitive mastering engineer will take a 
well-balanced mix the rest of the way; you may not 
even realize what was missing or how much it can be 
cnhanccd untilyou hcar the mastered versión. We 
try to enhance those moments where it should have 
swelled or dipped. for this is where some of the 
excitement of the song can be generated. 

How and When to Move the Fader 

F.xtra-soft beginnings, endings or even middle 
spots require special attention. Ifthe highest point 
in the song sounds "just right” after processing, but 
the intro sounds too soft, it’s best to simply raise the 
intro, findmg just the right editing method to 
restore the gain to normal after the intro using one 
or more of these approaches: 

• Sometimes a long, gradual decrescendo is the 
solution. which might occur at the end of the 
intro, or slowly during the first verse of the body. 

• Sometimes a series of 1/4 or 1/3 dB edits, taking 
the sound down step by step at critical moments. 
This is useful when you don’t want the listener to 
note that you’re cheating the gain back down and 


you may be forced to work against the natural 
dynamics. 

• Sometimes a quick edit and level change at the 
transition between the raised-level intro and the 
normal-level body creates a nice effect and is the 
least intrasive. 

The reverse approach, that is, purposely 
creating a softer intro so that the body of the song 
seems louder and has impact on the entrance can 
also work. In this case, the quick edit (gain change) 
between intro and body provides dramatic impact. 

The Art of Changing Internal Levels of a Song 

Some soft passages must be raised. But if the 
musieians are trying to play something delicately, 
pushing the fader too far can ruin the effect of the 
soft passage. The art is to know how far to raise it 
without losing the feeling of being soft, and the ideal 
speed to move the fader without being noticed. In a 
DAW, physical fader moves are replaced by 
commands, crossfades, or by drawing on a 
volume/time line. The trae magic of the mastering 
engineer is to be so invisible that no one knows you 
have anything up your sleeve; if they think the sound 
is being manipulated, you haven’t done your job. 8 
Here’s atechnique for decreasing the dynamic 
range in the least damaging and most helpful way. 

I learned this over 3 o years ago from Alee Nesbitt’s 
book The Technique of the Sound Studio (see 
Appendix 10). When doing it live, you must know 
the score, to anticipate the moves of the musieians. 
But after the fact, on a digital audio workstation it’s 
real easy, for the waveform is the scorc. Supposing 
that you must take a loud passage down. The best 
place to take the level down is at the end of the 


Chapter 9 114 


UNITV GAIN 


preceding soft passage befo re the loud part begins. 
Lonk for a natural dip or decrease in energy prior to 
the beginning of the crescendo, and apply the gain 
drop during the end of the soft passage before the 
crescendo begins. That way, the loud passage will 
not lose its comparative impact, for the ear judges 
loud passages in the context of the soft ones. 

The figure at right from a Sonic Solutions 
workstation illustrates the technique. The gain 
change is accomplished through a crossfade from 
one gain to another. 

Theproducerand I Hecided that the xhnut chorus 
of this jazz piece was a bit overplayed and had to be 
brought down from triple to double forte (which 
amounted to a dB or so). 9 To retain the contrast, the 
trick is to drop the level during the soft passage just 
before the drum hit announcing the shout chorus. 
You'll see this in the 12 second crossfade from unity 
gain (top panel) to -1.5 dB gain (bottom panel); the 
drum hit is just to the right of the crossfade box. If 
done right. you’ll still feel goose bumps as the 
musicians make a delicate soft move (now enhanced 
with a further decrescendo by the rnastering 
engineer), and then hityou with the chorus. 

Some songs start with a very soft introduction, 
and this may have to be raised. Other songs start 
softly and build to a big climax. I like to start 
mastcring by going directly to the climax. After I get 
a great sound with the necessary processing, I 
return to the beginning and if there’s room, I may 
lower the gentle introduction, which will enhance 
the body that follows by contrast. This also reduces 
the temptation to raise the loud part so much that it 



1.5 dB GAIN 


might be squashed by excessive processing. In the 
following figure, I’ve reduced the level of a song’s 
introduction, and slowly introduce a crescendo (20 
seconds long) that enhances the natural build of the 
song as it goes into the first chorus. The top panel is 
at —1 dB gain, bottom panel is at unity (o dB) gain, 
achieved at the end of the crossfade. 


The modern versión offader-ridmg. 
Note that the gain drop is performed 
in the soft passcge preceding the 
loud downbeat, thus preserving the 
apparent impact of the downbeat. 



A soft introduction has been 
reduced even further, and the 
impact of the body of the song is 
enhanced bygraduahy increasing 
the gain during the beginning of 
the main part of the song. 


1/5 Dynamics: Part One 





























































































Another trick is to increase the space before a 
song, which increases its dynamic impact by 
extendingthe tensión caused by silence. Give the 
ear a chance to adjust to silence and then hit them 
with all you’ve got! The best musicians know how to 
use space within their music; they consider the rests 
to be as important as the notes. 

In Conclusión 

Macrodynamic manipulation is a sometimes 
overlooked but powerful tool in the mastering 
engineer’s arsenal. In the next chapter we move on 
to the use of compressors, expanders and limiters to 
manipúlate microdynamics. 


1 Surprisc Symphony. No. 94 in C. 1791, incorporatcd a mischievuus drumbeat 
in the middle of a slow passage. This type of microdynamic insUntaneous 
impact is often termed a «forzando in western music. To 20*^ century ears, 
Haydn's piece oeems rather lame. E»pccially afteryou'vc been cicpoacd to John 
Williams’ quasi-classical Suite froni Clase Encounters reproduced on a 
decent Hi-Fi. 

2 Especially with the influence of Afro-Caribbean musical forms on jazz (and 
eventuaily R&B, fusión, and rock) when in the 1940*8 Dizzy Gillcspie brought 
percussionist Chano Pozo into his band. 

3 Naturally with many exceptions. F01 example. I think The Geto Boys Da Good 
Da Bad end Da Ugty. one of the honor roll CDs (Usted at www.digido.com). is a 
masterpiece of inventive musicality dynamic range. depth. and tone on the 
same orJer as a good classical work. 

4 It’s hard for a mastering engineer to return a master to a producer that isn't 
louder than what was sent. even if the original recording was alrrady too loud 
and coir.pressed. But I find that producers like to receive recordings which are 
clearer and more impacting than what they sent in. even if the master is not 
quite as loud. Daré to tiy it! 

5 Please do not confuse the term dynamic range reduction (compression) with 
data rau reduction. Digital Coding systems employ data rate reduction. so that 
the bit rate (measured in kilobits per second) is less. Examples inelude the 
MPEG (MP 3 ) or Dolby AC -3 (now called Dolby Digital) systems. 

Since it’s not good to refer to two diíferent concepta with the same w r ord. we 
should encouragc people to use the lerrn Data Reduction Svsten or Coding 
System when referring to data and Compression only when referring to the 
reducticn of dynamic range. 

6 Excessive is definitely in the ear of the behearer! It’s veiy important to develop 
an esthetic which appreciates the benefits of dynamic range. and which also 
knows when there is too much—or too little. This is clearly a matter of taste. as 
well as objective knowledge of the requirements of the médium and listening 
environment. 

7 AGC (automatic gain control) has been given a bad ñame by its ubiquitous use 
in consumer and professional camcorders. Listen to the news reports on TV 
where a portable camera w r as used with AGC to see what I mean. You will hear 
severe hiss modulation in between syllables. and the transient syllabic impact 
is reducid. 

8 lilis is truc for most of the "natural” music genres. with some exceptions being 
hip hop. psychedelic rock, performance art, etc., where the artists invite the 
engineer to contri hnte surprisingnT rococo dynamic effeets 

9 Producers don’t always use classical Italian dynamic terms to describe their 
needs. The mastering engineer should chosc the bonding language which is 
best forthe client- "Make it louder. man!" 



CHaPTer 10 

Howto 
Manipúlate 
Dynamic 
Range for Fun 
and Profit 


PaRT TwO: 
Downward 
Processors 


I. Compressors and Umiters: 

Objective Characteristics 

Part two and Part three of this series are about 
microdynamic manipulation, which is primarily 
achieved through the use of dedicated dynamics 
processors. In this ehapter (part two), we lookat 
how downward processors work. Before we can 
learn how to use devices such as compressors and 
expanders, we must study the objective character¬ 
istics of the devices which perform the job. 

Transfer Curves (Compressors and Limiters) 

Let’s begin with the measurable 
characteristics of processors which perform 
downward conipression, simplycalled compressors 
and limiters. 

A transfer curve is a picture of the input-to- 
output gain characteristic of an amplifier or 
processor. A straight wire or unity-gain* amplifier 
wouldyield a straight diagonal line across the middle 
at 45 o , called the unitygain line. A family of linear 
curves can be drawn, as in these three figures.- 



Three transfer curves. At left, a Unity-Gain Amplifier, then an amplifier with 
10 dB gain, then with 10 dB loss (attenuation). 


• Unity-gain means the ratio of output to input leve! is i. or o dB. 



■«0 -80 -60 -40 -20 Odb 


"7 































Input. level is plotted on the X axis, and output 
on the Y. At left is a unity gain amplifier, followed 
by one with 10 dB gain, and with 10 dB loss 
(attenuation). As longas there is a straíght line (not 
a curve) at 45 o , the amplifiers are linear. Notice that 
the middle plot wouldyield distortion for any input 
signáis above —10 dBFS. 

The threshold of a compressor is defined as 
the level above which gain reduction begins to 
occur. Compression ratio is the ratio of input 
change to output change above the tlireshold. At 
left in the following figure is a simple compressor 
with a fairly gentle 2.5:1 compression ratio, and a 
threshold at around —40 dBFS (which is quite low 
and would yield strong compression for loud 
signáis). 2-5 '.i means that for a level increase of the 
source of 2.5 dB, the output will only go up 1 dB, or 
for a rise of 5 dB, the output will only go up 2 dB, or 
as can be seen in the plot, an input change of 20 dB 
yields an output change of a little less than 10 dB 
(once the curve has reached its máximum slope). 

A compressor such as this would actually make loud 
passages softcr, because the output is less than the 

input above threshold; 
this is always the case 
unless you follow the 
compressor with a gain 
makeup amplifier. 

At left, Compressor with 2.5:1 ratio 
and -40 dBFS Threshold and no gain 
makeup. Atright, th? same compressor 
with 20 dB gain makeup. 

At the right-hand side of the figure, by using 
gain makeup (a simple gain amplifier after the 



compression section), we can restore the gain such 
that a full level (o dBFS) signal input will yield a 
full level signal output. In this illustration, the 
amplifier has an extreme amount of gain, 20 dB, 
which would considerably amplify soft passages 
(belowthe threshold). Intypical use, makeup gains 
are rarely more than 3 or 4 dB. Loud input passages 
from about—40 to about -15 are still amplified in 
this figure, but above about -15 dBFS, the curve 
slopes back to unity gain and resembles that of a 
linear amplifier. Far below the threshold, it’s a 
fairly linear 20 dB amplifier and can liave pretty low 
distortion because there is no gain reduction action. 
At full scale, 20 dB of gain makeup is summed with 
20 dB of gain reduction, yielding o dB total gain. 
This particular compressor model’s curve levels off 
towards a straight line above a certain amount of 
compression, so the ratio only holds truc for the 
first 15-20 dB above the threshold. Other 
compressor models continué their steep slope, thus 
maintaining their ratio far above the threshold. 
There are as many varieties of compression shapes 
as there are brands of compressors, and they all 
give different sounds. To get the greatest esthetic 
effect from any compressor, most of the music 
action must occur around the threshold point, 
where the curve’s shape is changing; thus, it is 
likely a real-world compressors threshold would 
be nearer —20 to -10 dBFS, where most of the 
musical movement takes place. 

The following figure shows a very high ratio of 
ío.-i, without gain makeup. Notice that the output is 
almost a horizontal line above the threshold. Most 
authorities cali any compressor with a ratio of ion 


Chapterio 118 


















orgreater a limiter. There are very few analog 
compressors with greater ratios, however, some 
digital limiters have been built with ratios of iooo:i 
in order to prevent even the minutest excursión or 
overload above full scale (o dBFS). The portion of 
the curve at or near the threshold is called the kuee, 
which is the transition between unity gain and 
corapression. The shape of the knee can make the 
transition gentle, or hard. The term soft knee 
refers to a rounded knee shape, and hard knee to a 
sharp shape, where the compression or limiting 

kicks in 
quickly above 
the threshold. 
Conceivably. 
the change 
from unity to 
io:i could be 
instantaneous, 
in which case 
the knee 

would be a sharp angle instead of round, producing 
a sharp sonic change, thus a limiting effect. The 
need for a gentle knee depends a lot on how much 
musical activity is occurring at the threshold. If 
there is a lot of musical activity or movement around 
the threshold, the knee shape can be critical. For 
those models of compressors that do not have knee 
adjustments, some of the effect of the knee can be 
accomplished by tweakingthe ratio and/or threshold. 

Attack and Release Times 

Attack time is defined as the time between the 
onset of a signal that is above threshold and full 
gain reduction. It can be measured in micro or 



milliseconds though it can be as longas a second or 
two. Typical compressor attacks used in music 
range from 50 ms to 3 oo ms, with the average used 
probably 100 ms. Release time, also known as 
rccoveiy time, is dcfined as the time between when 
a signal drops below threshold and when the gain 
returns to unity. Typical compressor release times 
used in music range from 50 ms to 500 ms or as 
much as a second or two, with the average used 
probably 150-250 ms.' The terms short or fast with 
attack or release time may be used interchangeably, 
they mean the same thing. Similarly, slow and long 
attack and release times mean the same. 


At the left side of the follnwing figure is the 
envelope shape of a simple tone burst, from a high 
level to a low one and back again. 



At left, o simple tone burst from high to low level and back. At right, the same tone burst passed 
through a compressor with very fast attack, high ratio, and fast release time 


At the right side is the same tone burst passed 
through a compressor with a very fast attack, high 
ratio, and fast release. and whose threshold is 
midway between the loud and soft signáis. Note that 
the loud passages are instantly brought down, the 
soft passages are instantly brought up and there is 
less total dynamic range, judging by the relative 
vertical heights (amplitudes). 

• One manufacturen DBX, measures release time in dB/second. which is 
probably more accurate. but 1 finó hard to get used to. 


119 Dynamics: Parí Two 










At left in this next figure is the envelope of a 
compressor with a low ratio, slow atíack time and a 
slow release time. Notice how the slow attack time 
of the compressor permits some of the original 
transient attack of the source to remain until the 
compressor kicks in, at which point, the gain 
reduction brings the level down. Then, when the 
signal drops below threshold, it takes a moment for 
the release time to take action, and the gain is still 
low, then slowly the gain comes back up. A lot of the 
compression effect (the "sound" of the 
compressor) occurs duringthe critical release 
period, since as you can see, except for the attack 
phase, the compressor has actually reduced gain of 
the high level signal. 



At left, a Compressor with a low ratio, slow attack time and slo iv release time. At right, higher ratio, 
faster attack and very fast release. 

Contrast this with the compressor at the right, 
which has a much higher ratio. faster attack, and 
very fast release time. The higher ratio clamps the 
high signal down farther, and with the fast release, 
as soon as the signal goes below threshold, the 
release time aggressively brings the level up. This 
type of fast action can make music sound strongly 
compressed because it brings down the loud 
passages and quickly brings up the soft passages. 


Chaptcr io 120 


Here is another variation, a compressor with a 
release delay.- 

Output ofa Compressor 
with a low ratio, slow 
attack time, slow release 
time plus release delay 

A release delay control allows more flexibility 
in paintingthe sound character. Veiy few 
compressors provide this facility. It’s useful when 
we want to retain more of the natural sound of the 
instrument(s), not exaggerate its sustain when the 
signal instantly goes soft, or reduce "breathing" or 
hissing effeets when the source is noisy. The release 
delay is part of the subtle pastel color palette of the 
masleringartist. 

The next figure illustrates what happens when 
the attack and release times are much too fast. 

When the combination of 
attack and release times 
are extremely fast 
(typically <50 ms), a 
compressor can produce 
severe distortion, as it 
tries to follow the 
individua ' frequencies 
(waves) instead ofthe 
general envelope shape of the music 

The distortion is caused by the compressor’s 
action being so fast that it follows the shape of the 
low frequeney waveform rather than the overall 
envelope of the music. This problem can occur with 
release times shorter than about 50 ms and 
correspondingly short attack times. 




II. Microdynamic Manipulation: 
Adjusting the Impact of Music with 
a (downward) Compressor 

The Mixing Engineer as Artist 

Coinpressors, expanders and limiters form the 
foundalion of modern-day recording, mixing and 
mastering. With the right device you can make a 
recording sound more percussive or less percussive, 
punchy or wimpy, smooth or bouncy, good or bad, 
mediocre or excellent. 

When used by skilled hands. compression has 
produced some of the most beautiful recordings in 
the world, and a lot of contemporary music genres 
are based on the sound of compression, both in 
mixing and mastering, from Disco to Rap to Heavy 
Metal. A skilled engineer may intentionally use 
Creative compression to paint a mix and form new 
special effects; this intended distortion has been 
used in every style of modern music. The key 
words here are intent and skill. Surprisingly, 
however, some engineer/artists don’t know what 
uncompressed, natural-sounding audio sounds 
¡ike. While more and more music is created in the 
control room, I think it's good to learn how to 
capture natural sound before moving into the 
abstract. Picasso was a Creative genius, but he 
approached his art systematically, first mastering 
the natural plástic arts before moving into his 
cuhist period. Similarly, it’s good practice to know 
the real sound of instruments. Try recording a well- 
balanced group in a good acoustic space with just 
two mikes; it’s a lot of work, and a lot of fun! Before 
multitracking was invented, there was much less 


need for compression, because cióse miking 
exaggerates the natural dynamics of instruments 
and vocals. At first, compressors were used to 
control those instruments whose dynamics were 
severely altered by cióse miking, e.g. vocals and 
acoustic bass. Later, when modern music began to 
emphasize rhythm, many instruments began to get 
lost underthe energy, inspiringthe Creative 
possibilities of compressors and a totally new style 
of recording and mixing. Certainly the advent of the 
SSL consolé, with a compressor on every channel, 
changed the sound of recorded music forever. 

Limiting Versus Compression In Mastering 

Mastering requives new skills to be developed 
since we generally work on overall mixes instead of 
individual instruments. In mastering as well as 
mixing, compression and limiting change the peak 
to average ratio of music, and both tools reduce 
dynamic range. Most masteringengineers use 
compressors to intentionally change sound and 
limiters to change sound as little as possible, but 
simply enable it to be louder. Thal’s why limitéis 
are used more often in mastering than in mixing. 
There is no perfectly invisible limiter, but 
compression changes the sound much more than 
limiting does. Think of compression as a tool to 
change the inner dynamics of music. While 
reducing dynamic range, it can "beef up” or add 
"punch" to low- and mid-level passages to make a 
stronger musical message. With limiting, however, 
with fast enough attack time (i or 3 samples), and a 


• As with compressors, it is the gain makeup process that permils the output of a 
limiter to be loider. When the peaks have been brought down. there is room to 
bring the average level up without overloading. 


lS*t 


Dynamics: Part Two 



carefully-controlled fast release.* even several dB of 
limiting can be transparent to the ear. Consider 
limiting whenyou want to raise the apparent 
loudness of material without severely affecting íts 
smind: consider compression orupward expansión 
(see next Chapter) when the material seems to lack 
punch or strength or rhythmic movement. 

The BBC performed research in the 194,0’s 
demonstratingthat distortion shorter than about 
6-10 ms is fairly inaudible, which was the basis for 
the 6 ms integration time of the BBC PPM meter. 

In this modera solid-state world, some transient 
distortion as short as 1 ms will change the audible 
sound of the initial transient, particularly for 
instruments such as piano. So be sure to use your 
ears before limiting or reducing even short 
transients. With good equipment and mastering 
technique, wide range program material with a true 
peak to average ratio of 18 to 20 dB can often be 
reduced to about 14 dB with little effect on the 
elarity of the sound. Thal’s one of the reasons 3 o 
IPS analogtape is desirable as the médium to mix 
to.- it has this limiting function built-in. A rule of 
thumb is that short duration (a few milliseconds) 
transients of unprocessed digital sources can be 
reduced by 4, to 6 dB with little effect on the sound; 
however, this cannot be done with analog' tape 
sources, which have already lost the short duration 
transients. Any further transient reduction by 


* The faster the release time, the greatrr the distortion, which is why the only 
succcaaful limiters which use extra fast release times have auto reléase 
control, which slows down the release time if the duration of the limiting is 
greater than a few milliseconds. The effective release time of an auto-release 
Circuit car. be as short as a couple of milliseconds. and as longas 50 to 150 
míllisecords. If limiting a very short (invisible) transient. the release time 
can be rnade very short. 


compression or limiting will not be transparent 
(though it may still be esthetieally acceptable or 
even desirable). 

All digital limiters aífect the sound to somc 
extent. softeningthe transients and even fattening 
the sound slightly, as they allow us to raise the 
average level and the loudness. The less limiting we 
use, the cleaner and more snappy the sound, unless 
we are looking for a sound with softer transients. In 
an ideal mastering session, the limiter should only 
be acting on occasional inaudible peaks. Limiting 
distortion is especially audible on material which 
already has little peak iníormation because a limiter 
is not designed to work on the RMS portion of the 
music and limiters can sound pretty ratty when 
pushed into the RMS región. Watch out for severe 
bass distortion because the time constants of a 
limiter are too fast for optimal compression. 

A manual for a certain digital limiter reads "For 
best results, start out with a threshold of-6 dBFS.’’ 
This is like saying "always put a teaspeon of salt and 
pepper on your food before tasting it. ” Instead. 
mastering engineers should judge how much 
limiting to use based on the desired absolute 
loudness (compared with other CDs) and how much 
degradation we can accept. Some sources can 
tolérate 6 dB of limiting without significant 
degradation, others 1 or none. 

The World’s Most Transparent Digital Limiter 

T he most transparent limiter is to use no 
limiter at all! When we are tryingto make a 
section louder, if there is a veiy short peak 
(transient) overload, for example, during a section 


Chapter 10 122 



of a drumbeat, a skilled mastering engineer can 
perform a short-duration gain drop that can be 
invisible to the ear, with the DAW's editor. This 
manual limiting technique allows us to raise a 
song’s apparent loudness without the attendant 
distortion of a digital limiter, so it is the first 
process to consider when working with open- 
sounding music that can be ruined by too much 
processing. We can often get away with i to 3 dB 
manual limitingtypically for a duration of less than 
3 ms. But longer duration gain drops will affect the 
sound as much as or more than a good digital 
limiter. We use as little gain reduction as possible 
and when tiying to make material louder, squeeze 
as much level as possible without clipping, for it 
helps keep the limiting invisible. 

Equal-Loudness Comparisons 

Since loudness has such an effect on judgment, 
it is veiy important to make comparisons at 
equal apparent loudness. Duringan instant A/B 
comparison the processed versión may seem to 
sound better, if it is louder, but long-term listeners 
prefer a less fatiguing sound which "breadles.” 
Whenyou make comparisons at matched apparent 
loudness, you may be surprised to discover that the 
processing is making the sound worse, and it was 
all an illusion. 

The Nitty-Gritty: Compression ¡n Music Mastering 

Consider this rhythmic passage, representing a 
piece of modern pop music: 

shooby dooby doo WOP... 

shooby dooby doo WOP... 

shooby dooby doo WOP 


The accent point in this rhythm comes onthe 
backbeat (WOP), often a snare drum hit. If we 
strongly compress this music piece, it might 
change to: 

SHOOBY DOOBY DOO WOP... 

SHOOBY DOOBY DOO WOP... 

SHOOBY DOOBY DOO WOP 

This completely removes the accent feel from 
the music, which is probably counterproductive. 

A light arnount of compression might 
accomplishthis... 

shooby dooby doo WOP... 

Shooby dooby doo WOP... 

shooby dooby doo WOP 

...which could be just what the doctor ordered 
for this music because strengthening the sub 
accents may give the music even more interest. 
Unless we’re tiying for a special effect, and 
purposely creating an abstract composition it's 
wrong to go against the natural dynamics of music. 
(Like the TV weatherperson who puts an accent on 
the wrong syllable because they’ve been taught to 
"punch” eveiy sentence: "The weather FOR 
tomorrow will be cloudy”). Much of hip hop music. 
for example, is intentionally abstract—anything 
goes, including any resemblance to the natural 
attacks and decays of musical instruments. 

To manipúlate the music requires careful 
adjustment ofthreshold. compressor attackand 
release times. If the attack time is too short, the 
snare drum’s initial transient could be softened, 


12 3 Dynamics: Parí I'wo 


losing the main accent and defeating the whole 
purpose of the compression. If the release time is 
too long, then the comp ressor won’t recover fast 
enough from the gain reduction of the main accent 
to bringup the subaccent (listen and watch the 
bounce of the gain reduction meter). If the release 
time is too fast, the sound will begin to distort. If 
the enmhination of attack and release time is not 
ideal for the rhythm of the music, the sound will 
be "squashed,” and louder than the source, but 
"wimpy loud” instead of "punchy loud.” It’s a 
delicate process, requiringtime, experience, 
skill, andanexcellent monitor system. 

The best place to start adjusting a compressor 
is to findthe approximate threshold first, with a 
fairly high ratio and fast release time. Adjust the 
threshold until the gain reduction meter bounces 
as the "syllables” you want to affect pass by. This 
ensures that the threshold is optimally placed 
around the musical accents you want to manipúlate, 
the "action point” of the music. Then reduce the 
ratio to veiy low and put the release time to about 
250 ms to start. From then on, it's a matter of fine 
tuningattack, release and ratio, with possiblv a 
readjustment of the threshold. The object is to put 
the threshold in between the lower and higher 
dynamics, so there is a constant alternation 
between high and low (or no) compression with the 
music. Too low a threshold will defeat the purpose, 
which is to diffcrcntiate the "syllables” of the 
music; with too low a threshold everythingwill be 
brought up to a constant level. 


Typical Ratios and Thresholds 

When working on microdynamics in the above 
fashion, compression ratios most commonly used 
in music masteringare from about 1.5:1 through 
about 3 :i, and typical thresholds in the—20 to —10 
dBFS range. But there is no rule; some engineers 
get great results with ratios of 5:1, whereas a delicate 
painting might require a ratio as small as í.oiu or 
a threshold of —3 dBFS. Sometimes a recording 
requires the most gentle invisible compression 
without trying to alter its built-in dynamics. One 
trick to compress as invisibly as possible is to use 
an extremely light ratio. say 1.01 to 1.1 and a very 
low threshold, perhaps as low as - 3 o or -40 dBFS, 
starting well below where the action is. We may 
choose a low ratio to lightly control a recording 
that’s too jumpy or to give a recording some needed 
body. It’s unusual to see such low ratios used in 
tracking and mixing but very common in mastering 
of full program material, partly because with full 
program material, larger ratios may draw attention 
to the magic behind the curtain or reveal breathing, 
pumping or other artifaets. 

We have noted before that every brand of 
processor (both compressors and expanders) has 
its own unique characteristics and sound. Part of 
the fun of mastering (and mixing) is discovering 
the special characteristics of different compressors. 
Even with the same settings, some are smooth , 
others are punchy, some bringout percussion better 
than others. This is not due to attack and release 
times per se, but rather to the curve or acceleration 
of the time constants, whether the device recovers 
linearly from gain reduction, whether the gain 


Chapterio 124 


returns to unity quickly or slowly at the beginning. 
Design engineers spend much research time 
psyching out these particular characteristics, and 
the best we poor mortals can do is listen and see 
what we like. 

Fancy Compressor Controls 

Some compressors provide a crest factor 
control, usually expressed in decibels, or a range 
from RMS (orfull average) to quasi-peakthroughto 
ful] peak. What this means is that the compressor 
acts on either the average parís of the music, the 
peak parts, or somewhere in between. Ostensibly, 
compressors with RMS characteristics sound more 
natural as they correspond with the ear's sense of 
loudness, but the best-sounding compressor I own 
is peak-sensing. 

The Weiss model DSi - Mka is the first dynamics 
processor I’ve encountered with two different 
release time constants, release fast and release slow. 
The user sets a threshold of average transient 
duration, such as 8o ras, above which a sound 
movement is called slow, and below which it is 
called fast. Thus, instantaneous transients can be 
given a faster release time, but sustained sounds 
a slower one, which results in a more natural - 
sounding compression, especially with heavy 
compression. Indicator lights on the front panel 
aid in these adjustments. 

Compression and Monitoring 

I recall mixing a purist jazz recording using 
excellent powered monitors equipped with a driver 
protection circuit, which is ostensibly inactive 
except on peaks. However. when I arrived at my 


mastering room, I discovered that the recording 
"jumped out' too much, and required a bit of 
compression, a fací hidden duringthe mix and 
which I feel would have been similarly hidden had I 
monitored the mix with low-powered tube 
amplifiers (which self-compress). 

As I mentioned in Chapter 6, it is a myth that 
you have to "precompress” for small Systems. It’s 
actually the converse. I made an excellent snappy- 
sounding master where we were concerned that the 
upper dynamics might have a bit too much upward 
impact. But when the recording was auditioned on a 
typical boom box or bookshelf system, the peaks 
were squashed compared to the mastering room 
audition and actually would have benefited from 
even more impact. Thus 1 have learned that if it 
"sticks out a little too much” on a high-headroom 
mastering system, then it’s probahly going to be 
fine when played on an inferior system. However, 
you’ll never learn if something needs a bit more 
compression or is too compressed tvhen listening 
on a monitor system that squashes the sound. 

Multiband processing 

Multiband compression is probably the most 
powerful and potentially deadly audio process that’s 
ever been invented. Basically, a multiband 
processor splits the information into two, three or 
more frequency bands, so that the compression 
action in one band will not cause another band to be 
affected. For example, if the vocal causes a bit of 
gain reduction, it will not pulí down the bass drum 
(or viceversa), which might occur ifyou used a full- 
band compressor. This is the virtue and the 



MYTH: 

Program Compression 
is required to protect 
small reproduction 
Systems. 

I—_I 


'25 


Dynamics: Part Two 




justification of splitting processing into múltiple 
bands. However, multiband compression has been 
overused, and hyped in my opinión. It can easily 
produce very unmusical sound or take a mix where 
it doesn’t want to be. This tool requires careful 
judgment on the part of the mastering engineer. 

Multiband processing was probably first 
introduced by TC Electronic in their M5000, then 
in their ubiquitous Finalizer, and broughl to great. 
sophistication (and much better sound quality) in 

their System 6000. Tube- 
tech has produced a 
three-bandtube 
compressor. But múltiple 
bands are hardly needed; 
one or two bands are 
usually enough. Rarely do even hip-hop recordings 
need more than two bands to sound punchy and 
strong. I use more than two bands in my mastering 
no more than a few times a year. when múltiple 
bands have been a lifesaver. I largely use multiband 
compression (and expansión) to fix bad mixes that 
could not be remixed, for one key to a great master 
is to start with a great mix! 

When To Consider multiband processing 

• When there is a heavy and somewhat isolated bass 
drum and/or bass, splitting the processing into 
two bands prevents the drumbeats from 
modulatingthe rest, or viceversa. 

• When you want to let transients (percussive 
sounds) through while still punching the sustain 
of the sub accents or the continuous sounds. 
Transients contain more high frequency energy 
than continuous sounds, so splitting the processing 


'One key to a great master is 
lo start with a great mix. ” 


into a low and a high band permits using gentler 
compression or no compression at high 
frequencies (e.g., higher threshold, lower ratio). 

• When there is too much sibilance. Sibilance can 
be controlled by using selective compression in 
the 3 through 9 kHz range (the actual frequency 
has to be tuned by listening to the vocalist). Tiy a 
very fast attack and médium release and a narrow 
bandwidth for the active band. 

• When the mix is bad or certain elements appear 
to be weak in the mix, multiband processing can 
save the day, assuming a remix is not possible. 

I once received a rap project that was somehow 
mixed with very low vocal and extremely loud 
pcrcussion and bass drum, and a remix was not 
possible. By compressing and then raising the 
level of the frequencies in the vocal range (circa 
250 Hz) I was able to remix thepiece and verv 
nicely, turn the vocal up. Clearly, multihand 
compression is a power that should be used 
very wisely! 

However, before trying multiband, first 

• See if simply raising the attack time in a one - 
band compressor permits sufficient transient 
energy to come through. Or, try upward expansión 
(described in the nexl Chapler) inslead. 

• Try using few bands, onlytwo if possible. This 
avoids potential phase shift and unnatural 
relationships between the mix elements of the 
mix, which can become the enemy of the mix 
engineer’s delicate creation. 

Equalization or Multiband Compression? 

When multiband processing is available, the 

line between equalization and dynamics processing 


Cliapter 1 o 


1 z6 


becomes nebulous. because the output levels of 
each band form a basic equalizer. Use plain 
equalization when instruments at all levels need 
alteration. Or consider multiband compression, to 
provide spectral balancing at different levels. For 
example, a songmay get harsh-sounding when it 
gets loud, and it is possible to simúlate the 
euphonic high-frequency saturation characteristics 
of analog tape by using a bit more compression at 
high frequencies. 

If we’re already using split dynamics, we make 
our first pass at equalization with the outputs 
(makcup gains) of cach band. Multiband 
compression and equalization work hand-in-hand. 
Tonal balance will be affected by the crossover 
frequencies, the amount of compression. and the 
makeup gain of each band. In general, the more 
compression, the duller the sound. because of the 
loss of transients. I first tiy to solve this problem by 
using less compression, or alteringthe attacktime 
of the high-frequency compressor, and as a last 
resort, I use the high frequency band’s makeup gain 
or an equalizer to restore the high-frequency balance. 

Clipping, Soft Clipping and Oversampled Clipping 

Clipping is the result of attempting to raise the 
level higher than o dBFS, producing a square wave, 
a severe form of distortion. Clippers are devices 
which electronically cut momentary peaks out of the 
waveform to allow the overall level to be raised. Soft 
clipping attempts to do this with less distortion. 

I’ve decided that I don't like the quality of 
distortion produced by clipping or soft clipping, at 
leastat 44.1 kHz SR (see Chapter 16). I believe the re 
are better approaches. The first is not to raise the 


level at all, for many CDs are already too hot for 
their own good. Or use a good limiter, which sounds 
better than clipping to my ears. In Appendix 1, 
radio guras Bob Orban and Frank Foti explain why 
clipping is a severe problem for radio processors. 
The jury is still out when it comes to oversampled 
clipping, whose distortion artifacts can be reduced 
by half in the audible (20-30 kHz) range, but isn’t 
that really like saying she’s 
a little bitpregnant? 

Compression, Stereo 
Image, and Depth 

One surc way to 
destroy the depth in a 
recording is to compress it too much. Compression 
brings up the inner voices in musical material. 
Instruments that were in the back of the ensemble 
are brought forward, and the ambience, depth, 
width, and space are degraded. But not eveiy 
instrament should be ”up front". Pay attention to 
these effects whenyou compare processed vs. 
unprocessed and listen for a long enough time to 
absorb the subtle differences. Variety is the spice of 
Ufe. As always, make sure the cure isn’t worse than 
the disease. 

The Mastering Engineer’s Dilemma 

Without compressors in CD changers and in 
cars, it is extremely difficult for the mastering 
engineer to fulfill the needs of both casual and 
critical listeners. It is our duty to satisiy the 
producer and the needs of the listeners, so we 
should continué to use the amount of compression 
necessaiy to make a recording sound good at home. 
But try to avoid using more compression than is 


{ "Not every instrument 
should be up front." 


127 Dynamics: Parí Two 


'Never in the history ofmankind 
have humans listened to such 
compressed music as we listen to 

nOVO. — Bob Ludwic* 


required for home 
listening. This 
approach will 
actually help radio 
play (see 
Appendix x). If 
compromises have 
to be made for car or casual play, tiy to use 
transparent-soundingtechniques such as parallel 
compression (see next Chapter), which satisfyeven 
critical listeners. Audition test masters in all 
environments, hopefully arriving at a decent 
compromise. 


III. For the Mixing Engineer: HowTo 
Avoid Hypercompressiont during Mixing 
and Tracking 

Letterfrom a DIGIDO.COM visitor: 


I found your site through a link. I was 
looking for information on how to use 
my compressors to make my music 
better. What I found was instrucfion on 
how not to use my compressors to 
make my music better. The quality of 
my recordings has gone up greatly 
since I read your articles. 

How to Avoid making Hypercompressed Mixes 

Hypercompression is a form of sound 
squashing, where everything has an unrelenting 
and fatiguing intensity, with lost transients and 
reduced definition. When overused, mastering 


• In correspondence. A variation of this quote is in Owsinsky, Bobby. Mastering 
Engineer s Hanabook. 

t The expressive term hypercompression was coined by Lynn Fuston of 3 D Audio. 


tools can produce this result, though the tools to do 
it have migrated to the mixing studio, with a lot of 
unfortunate sonic results (and a few sonic gems), 
in my opinión. Hypercompression produces the 
reverse effect from the intent of a good mix— 
bonng, lifeless mush. Perhaps the current slack in 
music sales is related to hypercompression and its 
tendeney to give everything a monotonous 
sameness—is the public voting against compression 
with its pocketbook? Lately it seems about the only 
place we can enjoy good dynamic range and impact 
is in the motion picture theatre. This book is partly 
about how we can bring similar life to our music 
masters. In this chapter we concéntrate on some 
advice for the mixing engineer. 

Let me tell you a sad story. A pop-rock band 
once sent me a mix that they felt a bit uneasy about, 
though they could not exactly express why. When 
I received the DAT it was obvious why. Here’s what 
I heard: 

• there was absolutely no dynamic range left, 
it was "maxxed to the max.” 

• there was no transient information. 

• the sound was grainy and literally lifeless 
(squashed) 

• all the songs sounded continuously and 
fatiguingly loud. I couldn't listen for more 
than a couple of minutes at a time. 

• although the obvious intent was to produce a 
hot, clear, punchy sound, the result was exactly 
the opposite. 

No wonder the band felt uneasy, but still they 
couldn’t put their finger on the problem. All the 
mix elements were there, and the tonality seemed 


Chapter i o 128 



fine. It was easy for me to tell: their engineer had 
núxed directly from multitrackthrough a 3-band 
mastering eompressor to DAT. In a way I admired 
his work because he obviously had slaved for hours 
at the diais "perfecting” this most disappointing 
sound. Amazingly there were no intermodulation 
artifacts betweenthe frequency bands, an example 
of the power of this box, for I was instantly able to 
identify the brand and type of processor he had used. 
I called the group and asked them to check if he had 
made an unprocessed mix as well. Unfortunately he 
had not. Sadly, I was unable to do anythingto 
salvage this production. 1 tried a bit of upward 
expansión (to undo the damage), and the band felt 
it was an improvement, but an upward expander can 
only accomplish something when there is 
"movement” in the source to grab onto (to amplily). 
Why do you suppose he did this? The motivation 
was eventually traced to a misguided desire to make 
therecording"radio-ready” (see sidebar). 

Here are some ways to avoid hypercompression 
during mixing, which easily occurs when consoles 
and DAWs have a eompressor on eveiy channel 
strip. Eveiyone has his own style of working with 
compressors and there are no rules. But I suggest 
ihat when learning or beginning a mix, start by 
working without any compressors! Thenyou’ll 
discover the necessity which was the mother of its 
invention. The eompressor will then become for 
yon a tool to handle problems which cannot be 
handled with fader moves, not a crutch or substitute 
for good recording and mixing techniques. Learn 
about the natural dynamics and impact of musical 
instruments. then begin to alter them with 


compressors (which can inelude usingeompression 
to create special effeets). Eveiy 5 years or so, give 
yourself a reality check...try making a recording or 
mix with little or no compression. You'll rediscover 
the parts of music that make it lively 
and aid inits clarity. It’s a real 
challenge, but a refresher course 
may point out that íess compression 
will buyyou a more open, more 
musical sound thanyou’ve 
previously been getting. 

Start mixing fresh each time— 
free yourself of preconceptions. 

Although you compressed the hass 
on 9 out of the last 10 albums, maybe this time you 
won’t need a eompressor. Each musician is an 
individual and their sound must be respected. In 
general, the better the bass player, the less 
compression will be needed, and the greater the 
chance that compression will "choke up” his sound. 
If you get to know the sound of your instrumen- 
talists you can then ask yourself: are you trying to 
capture the sound of your instrumentalists or 
intentionally creating a new sound? Get a great mix 
that sounds alive and clear and big’ and then later 
see how much better it can be made in the 
masteringsuite, for mixing and mastering are two 
different things. Aíter mixing for a while, compare 
the mix to the raw, unaltered monitor mix (which 
can be a sobering experience): be honest, have you 
lost some of the magic that you captured on the 
recording day? Has the sound closed down instead 
of opening up? 


The Real Recipe for Radio-Ready 

The real recipe for Radio-Ready ineludes: 

t) Write a great original song, use fabulous 
singers and wonderful arrangements. 

2) Be innovative, not imitative. 

3) Make sure the music sounds good at home. 
Keep the dynamics lively, interesting and 
unsquashed, and some of that virtue will 
make it through the radio processing. 


• Not every piecc of mu8ic ghould be big-ioundmg. but I think you : j¡et the idea. 


139 Dynamics: Part Two 





The process of refining a mix should always 
inelude revisitingyour compression (and EQ) 
settings and Cfuestioningyourwork. Compressors 
are often used to create a tighter band sound, 
making the rhythm instruments sit in a good. 
constant place in the mix. Bui the wrong 
compression setting can take away the sense of 
natural breathingand openness that makes music 
swing and sway. Thus, I recommend that during 
mixing, afteryou’ve inserted a few compressors on 
certain instruments (e.g., the bass, rhythm guitar, 
vocal) and listened for a while, try comparing with 
the compressors bypassed (total automation makes 
that process easy; store two fader snapshots so you 
can switch between them). Ifyou’ve lost some of the 
swing, or ihe subtlelies of the musician’s 
performance, then try reducing or eliminating 
some compression. 

I thinksome of today’s mix engineers have to 
learn (or relearn) the ability to mix loudly and 
clearly. Rock and Roll music is often a casualty of 
compressor abuse. I receive rock mixes from well- 
meaning engineers that should be getting louder 
and louder and reach a climax, but which have lost 
their intensity, produci ng witnpy load sound." There 
is dynamic inversión; instead of a chorus sounding 
lively and dramatic, it’s been pulled back. To make a 
better sound and ease the mastering engineer’s job, 
check the climaxes; do they sound open, or 
squashed? Squashing is a common problem in rock 
mixes, for it is very difficult to maintain excitement 
all the way to the highest peaks, but squashing is very 

* "Il's likc there has been an unleaming cune. As flexibility has improved, 
respect for the integrity of the source has all but vanished as people becomc lost 
in the possibilities." Bob Olhsson. Mastering Engineers Webboard. 


hard to repair in mastering. One trick is to start 
mixing during the climax of the song, make the 
climax sing and swing, using just enough 
compression on individual instruments to do the 
trick; then, return to the beginning, workyour butt 
off riding faders where necessaiy during the soft 
passages but without changingthe thresholds 
from the position used for the peak of the song. 
This helps avoid overcompression on the loud 
passages and keeps the song sounding exciting. It's 
better to send material that’s mixed well and 
powerfully at the mid levels but at the high levels is 
not squashed. Even if the climaxes don't sound loud 
enough to the mix engineer, he should consider it a 
work inprogress , for the mastering engineer can take 
it to the uext level of performance, with the punch it 
needs at mid levels and strength and volume at high 
levels. 

I advise against mix engineers trying to mix 
through dedicated mastering processors unless you 
have the patience to refine the many parameters 
against the constantly-changing parameters of a 
mix in progress. Even bus compressors built into 
consoles are not usually optimized for proeessing 
overall music. A processor on the bus will change 
the mix in mysterious ways; it's not predictable 
whether the vocal or any instrument will stand out, 
and it can fight the mix instead of helping it. 
Wideband bus compression causes all the 
instruments to be modulated by the attack and 
transients of the loudest instrument. A rim shot or 
cymbal crash can take down the reverberation and 
the sound of all the other instruments. Any 
compressor on a mix bus can quickly become a 


Chapter 10 i 3 o 



crutch, a substitute for good mixing techniques. 
Some mix engineers add delicate bus compression 
afterthe mix has been achieved. to see if it fattens the 
sound without deterioration. And to keep the bus 
compressor from punehing "boles" inyour mix, 
they use a veiy slow attack/release and veiy little 
compression (e.g. i dB). 

Hedge Your Bets. Many mix engineers will 
subvert Murphy’s Law of Experience and print two 
versions to send to mastering, one with bus 
compression and one without. I often find the 
bus-compressed versión has fatter bass (which the 
client likes) but wimpy highs and attacks (which the 
client doesn't like), but in masteringyou can have 
your cake and eat it too-. I can supply dynamics 
processing with carefully-appiíed múltiple time 
constants, yielding a more impacting result that 
still has "fat bass.” Of course, if the mix was made 
so aggressively through the bus compressor that 
removing it would change the mix. then there is no 
point in providing two versions; be aware that you 
are paintingyourself into a córner, if a remix is not 
an option. 

But what ¡f you want to mix aggressively... 

This should be the province of the experienced 
mixer who knows that this is the practice that works 
best for the particular music, client, or audience 
and who recognizes the fine subjective line between 
aggressive bus compression and hypercompression. 
In other words, some engineers mix aggressively on 
purpose with the bus compressor (or against it); 
which is only ok if-. 


• the music truly calis for it 

• the experienced mix engineer is aware of all the 
effects of the bus compressor on the sound 


But be careful howyou make itloud, because if 
you deteriórate the clarity of the sound, there’s 
little that can be done to fix it in the mastering. 
When mixing with 


Leaming from your mistakes gives 
you room to make even bigger ones! 

— Murpht’s law of experience 


aggressive bus 
compression, I advise 
you to ascertain the 
mastering engineer’s 
opinión on this mix in 
progress. Recently I 
asked a client why he was using bus compression on 
his mix, and he replied, "because I think it doesn’t 
sound loud enough without it." But through 
demonstration, we found out that his mix sounded 
wimpy loud but not better (e.g., fatter, punchier, 
clearer, fuller). I suggest that you concéntrate on 
mixing and save the cpiestion of absolute loudness 
for the mastering; when mixing, go for better when 
auditioned at the same loudness (i.e. turn up the 
monitor gain until it sounds loud enough). I think 
Mastering engineers can do a better job and for 
much music would prefer not to receive bus- 
compressed mixes—we can stand back objectively, 
fine-tuningtime constants and bandwidths, 
maximizing the sound quality (and level) without 
destroying the rhythm, melody or dynamics of the 
music. Each tune will be optimally and precisely 
adjusted in the context of the whole álbum. 
Attempting these sorts of decisions during mixing, 
without havingthe perspective of the entire álbum, 
is dangerous since it’s irreversible. 


i 3 i Dynamics: Parí Two 


Ifyou wish to tiy your hand at mastering 
Processing after mixing, by all means do so, 
perhaps as an example of the type of sound you are 
looking for, but also bring an unprocessed mix 
safety to the mastering session. 

Monitor gain" has a tremendous effect on 
these matters of judgment. The higher you place 
the monitor gain, the less the chance of over- 
compressing. If the music mix sounds properly 
"punchy" at a higher monitor gain, then leave the 
rest of the magic for the mastering rather than add 
another DSP process ortake the sound downhill. 
The VU meter (as opposed to the peak meter) is 
our friend. Have one hanging around, preferably 
calibrated to o on the VU meter = -20 dBFS onthe 
peak meter with a sine wave, or if necessary, to as 
high as —14 dBFS peak. If the VU meter is reading 
hot, then the sound may be overcompressed. 

Stop Emuiating Squashed CDs 

Many mixing engineers compare their mixes 
against already-pressed CDs, but be careful what 
you choose as a standard. Ironically, mastered CDs 
often do not sound like what comes out of the mix, 
so how can you emulate somethingwhich can only 
be done post-mix? And emuiating aggressively- 
mastered CDs for a mix may contribute to the 
vicious circle of escalatingloudness. What you 
reaily need is to hear the sound of a good mix before 
it was sent for mastering. But since that’s not 
available, choose frorn the plenitude of pop records 
that have been well-mixed and conservatively 
mastered. Visit www.digido.com for The Honor Roll, 

• I prcfer the term monitor gain to volume control. Sce Chaptcr 14 


Chapter 10 ¡ 3 i 


a listing of well-mixed and conservatively- 
mastered current CDs. 

Avoiding Compression Problems during Tracking 

When tracking vocalists (who have a habit of 
belting now and then), a well-adjusted compressor 
can sound reasonably transparent, and most 
engineers agree the cure is better than the disease. 
But watch out for a closed-in sound, clamping down 
when the vocalist gets loud (which reduces clarity 
and impact). which can be caused by improper time 
constants, too higharatio, orusingthe wrong 
compressor. Compare 1 N versus BYPASS before 
committingto tape. Match levels to make a fair 
comparison. Ifyou notice too much degradation, 
maybe it’s time to consider a different compressor 
or change the settings you are using. The sound 
should be open and clear... rcmcmber that no 
amount of equalization in the mixdown can 
substitute for capturing a clear sound quality during 
tracking. This is true for all the lead instruments, 
including trumpets and electric guitars. If possible, 
put the uncompressed sound on a spare track—it 
may save your life. If there’s any rule, nine out of 
ten engineers would prefer to save the decisión on 
drum and percussion compression until mixing. 
There are always exceptions—every piece of music 
is unique. 



CHaPTer 11 


HowTo 
Manipúlate 
Dynamic 
Range for Fun 
and Profit 

PART ThREE: 
The Lost 
Processes 


Introduction 


This chapter introduces two processes which 
should be part of eveiy audio engineer’s vocabulary. 
To be successful with them, you have to learn to 
think like a contrarían, but it’s well worth it. 

I. Upward Compression 

Over-concentration onthe use of downward 
compressors—makes it easy to overlook the 
psychoacoustic fact that the ear is much more 
forgiving of the upward "cheating” of soft passages 
than of the awkward "pushing down” of loud 
passages. The latter íeels like an artificial loss while 
the íormer can feel very natural. 

Let me introduce you to a venerable 
compression technique which has finally come of 
age. Imagine compression that requires just a single 
knob—no need to adjust attack, threshold, release or 
ratio. The sound quality is so transparent* that 
careful listening is required to even know the Circuit 
is in operation! Afewyears ago NewZealand radio 
engineer Richard Hulse discussed with me his 
practice of paraUel compression , 1 which 
accomplishes upward compression. Richard was 
using analog components and got acceptable results, 
but he thought that a digital implementation could 
sound cvcn better and suggcstcd I try one. I found 
the digital versión of this technique to he so 
successful that today I often use it to fatten sound 
and bring up soft passages in place of manual gain 
riding. The principie is quite simple: Take a source, 
and mix the output of a compressor with it. Many 

* For me, the term transparent means the signal path sounds as olean as the 
source. 


>33 




mix engineers have practiced this approach with 
their analogtools. In the digital domain, it is 
possible to sum the source with a compressor 
without anv side effects, by using a precise time 
delay for the "dry’’ signal which exactly matches that 
of the compressor, asshown inthishlock diagram 
(one channel only of stereo shown): 



The Parallel Compression 
technique employs a matched 
time delay in the "dry” signal 
path to avoid phase shift or 
comí filtering. Thisyields very 
transparent-sounding upward 
compression. 


In principie, the distortion of the parallel 
compression technique can be much lower than 
standard (downward) compression, since most of 
the signal has a linear path, and the non-linear path 
is added to the main path. 2 The amount of 
compression is controlled by the attenuator or 
makeup gain. The object of the technique is for the 
parallel compressor to contribute less and less to 
the total sound as the signal gets louder. This is 
accomplished by using a very low threshold, thereby 
putting the parallel compressor into gain reduction 
almost all the time. 


Here are suggested optimal settings for the 
parallel compressor, derived from original 
experiments performed by Richard Hulse: 

• Threshold —50 dBFS. Avery lowthreshold 
ensures that the parallel compressor will be into 
extreme gain reduction duringloud passages. 
Because the output of the parallel compressor has 


been pushed down duringloud passages, it will 
contribute only negligibly to the total level. In 
principie, if you add in a second signal that is 30 
dB or more down, the second signal will not 
perceptibly contribute lo the total level. 

Attaek time as fast as possible. One millisecond or 
less if available. This ensures that the transient 
impact of the original sound will be preserved, for 
as soon as a loud transient hits, the compressor 
goes into gain reduction. It helps for this 
compressor to have look ahead, which means that 
it has a huilt-in time delay that permits it to look 
at the incoming signal levels and perform 
predictive gain reduction. 

Ratio ?:i or 3.5:1 (I prefer 3.5). The net ratio of the 
sum of the parallel chain varies depending on how 
much of the parallel compressor is being added 
in. Richard has developed a chart so you can go by 
the numbers, but I find it unnecessary and simply 
go by ear. 

Release time médium length. Experiments show 
that 350-350 milliseconds works best to avoid 
breathing or pumping, although in cases where 
the reverberation is veiy exposed, particularly a 
capella music, as much as 500 ms. may be needed 
to avoid overemphasizingthe reverb tails. 

Output level or makeup gain adjusted to taste. 

With the parallel compressor off (-°° gain). there 
will be no compression. o dB or higher, 
compression will be very noticeable, with soft or 
even medium-level passages being raised in level. 
A nice subtle compression can be achieved with 
makeup settings of —5 through —15 dB (the lower 
the level of the compressor, the less total 
compression). 


i3/¡. 


Chapter 11 


























To determine the time delay needed to 
compénsate for the compressor, adjust the parallel 
compressor to a i:i ratio and unity gain output. If 
possible, invert the polarity to either hall' of the 
chain. Then adjust the time delay until there is a 
complete nuil. Typical delays are 5 to 10 samples, 
but can be much more if there is considerable look- 
ahead delay in the parallel compressor. If a 
(non-delayed) polarity invert is not available, adjust 
the time delay until signal level is máximum (it will 
have 6 dB extra gain when the delay is correct) and 
check with pink noise to confirm there is no comb- 
filter effect. 

Correspondents have told me they have 
successfully implemented this technique in Pro 
Tools. Digital Performer, and in SADiE. Every 
digital processor can easily inelude a parallel 
compression algorithm. Weiss has incorporated it in 
their DS1-MK2.Uve adapted a single engine of the 
TG Electronic System 6000 to stereo parallel com¬ 
pression: Feed the signal into the 5.1 (surround) 
compressor. use the front L/R channels as the ”diy" 
signal, bypassingthe sidechain. Use the SL/SR 
channels as the compressed signal. The time delay is 
automatically taken care of as all channels of the 5.1 
compressor have matched delay. I then assign the 
output level of the compressor to a fader and adjust 
to taste by listening. The fader level is a fair guide to 
how much compression is being applied; there is no 
need to look at a gain reduction meter. During 
operation, the contrarían engineer just looks for 
extreme low level passages, and adjusts the parallel 
compression until the level dips sound more natural 
or the sound gets a bit fatter and fullcr if dcsircd. 


Parallel compression can also be used 
multiband, to separately fatten a bass instrument, or 
to give more presence to low level passages, which is 
more like dynamic equalization than compression. 1 
assign the output level of each band to a fader, and 
adjust the sound to taste. The nice thingabout the 
fattening qualities of this compression technique 
when helping the bass instrument is that the body of 
the sound gets fatter without destroyingthe 
transient impact. Or when increasingthe presence 
írequencies at low levels, the sound can be clearer 
and better defined without becoming harsh at mid 
orloud levels. 

Even at severe settings, parallel compression 
sounds much better to my ears than any squashing 
I’ve heard from severe downward compression. 
Unlike downward compression, this form of upward 
compression preserves the transients or initial 
attacks very well. In addition, there’s room to be 
expressive at the top levels with upward expansión 
(see next section) if the original material was too 
compressed at high levels. Like any process, if 
upward compression is pushed too far, it will 
eventually cali attention to itself. The first audible 
artifact will be increased sustains and emphasized 
reverberation, then, íinally, breathing or pumping. 
These artifaets can sometimes be reduced by raising 
the release time of the parallel compressor. 
However, if the music is so open that the process 
continúes to cali attention to itself, the only solution 
is to abandon the processor and manually raise the 
passages which are too soft. 


i 3 ¡f Dynamics: Part Three 


II. Upward Expansión 

Another underused but incredibly useful 
Processing technique is upward expansión. Some 
people think oí' an upward expander as the 
uncompressor, but it is far more than that (indeed 
there is a limit to how much a sound can be restored 
once it has been excessively compressed). Rather, 
upward expanders can be used to emphasize 
different parts of the dynamic rhythm from those 
affected by downward compressors, and the result is 
often more consonant with the natural movement of 
the music. For example, upward expansión is great 
for restoring the liveliness of typical uninteresting 
musical samples fromsamplers. It can also put the 
snap back into a slightly-squashed snare drum. 
Upward expansión is definitely a technique worth 
learning, and is no more difficult to use than a 
downward compressor, once you learn to think like 
a contrarian and use the threshold, ratio, and 
attack/release. 

Historically, upward expanders were not easy to 
build until the advent of the VGA. 3 Once you have a 
VCA-based compressor, it’s a simple matter to turn 
it. into an upward expander by reversingthe sign 
(polarity) of the sidechain signal. Probably the first 
commercial dedicated upward expander was in a 
device made by DBX called the model 117, circa 1971, 
designed to enhance dynamics in a hi-fi System. 
Another early upward expander was the Phase 
Linear Peak Unlimiter. The honor for the first 
digital upward expander goes to the Waves Ci (plug- 
in), algorithms designed by Michael Gerzon. The 
first stand-alone digital upward expander was in the 
DBX Quantum mastering unit, followed shortly by 


theWeiss DS1-MK2. The Waves C4 (plug-in) is the 
first single processor to perform all of the four 
dynamics processes, though it can perform only one 
of the four at a time on each band. It is very 
desirable to be able to do simultaneous upward 
compression, upward expansión, and limitingin a 
single box. 

Ironically, downward compression doesn’t 
make the loud parts louder, it maltes them softcr. 

pushing ascendingpassages downward. A loudness 
increase is obtained as the incoming level decreases 
and the compressor goes into the release phase, 
raising the gain. In contrast, when the parameters 
have been optimized, upward expansión increases 
the loudness of passages that are ascending in 
volume, in rhythm with the upward motion of the 
music. (Henee it may be necessary to use output 
attenuation instead of makeup gain to prevent the 
output from overloading.) There is a small increase 
in dynamic range, but if used delicately íor 
microdynamic purposes, the upward expander 
becomes as valuable a production tool as the 
downward compressor. 

This next figure shows an upward expander with 
asevere .75:1 ratio and threshold at — 3 c> dBFS. 
Without attenuation it will overload with input levels 
exceeding about —10 dBFS. Note that the ratio oían 
upward expander can be expressed in decimal or 
fraction form depending on the manufacturéis 
preference. The Waves and DBX units use decimal 
form. while theWeiss unit expresses this in fraction 
formas 1:1. 33 . Typically, the range of ratiosused in 
upward expansión is far smaller than those used 


Chapter 


i36 


when compressing. Commonly, from a veiy gentle 
i:i.oi through about 1:1.2 (fraction); equivalent to 
from 0.99 through .83 (decimal). A common valué 
usedfor music enhancement is around .91 decimal 
(i:i. 10 fraction). 



An upward 
expandir with.75:l 
ratio, expressed in 
decimal (1:1.33 
expressed as a 
fraction). 

Threshold is -32 
dBFS, and without 
adding loss, the 
output will 
overload ifinput 
exteedi upproxi- 

mately-10 dBFS. 


The next figure coxitrasts fast and slow attack, 
and fast and slow release when used with an upward 
expanden As you can see, the dynamic character- 
istics are opposite from the compressor examples 
shown in the previous chapter. 

The best way to learn how to use an upward 
expander is to compare it to a downward 
compressor, described in the chart on the next page 
(valúes given in the chart are only for general 
purpose guides). 

Compromises When Making Hot Masters 

Both Downward Compression and Upward 
Expansión result in compromises if you are tiying to 
makea master super-hot (high ahsolute loudness). 
The problem with downward compression is that it 
is hard to avoid the squashing effect and loss of 



At left, upward expander with fast attack, slow release. 

Atright, slow attack, fast release. 

dyuamics. By splitliiig the bamls (inultiband, see 
Chapter 10), you can slightly postpone the inevitable 
sonic degradation. The problem with upward 
expansión is that if you are tiying to make a 
recording hot, you must followthe expander with a 
limiter to increase the level, but the limiter will fight 
the advantages of the expander, and soonbecomes 
the limiting factor (oops!). When the limiter is used 
conservatively, it will not deteriórate the sharp 
transients, and the upward expander can do its job 
of making the upward-going dynamics more 
exciting. Prove it by bypassing the limiter at 
matched compare levels and see if it’s hurting the 
sound of the music. If it is, and you cannot live with 
the degradation, the only solution is to master at a 
lower level. 


1 3 y Dynamics: Part Three 







DOWNWARD COMPRESSION 

UPWARD EXPANSION 

makes sound louder during the descent of the music 
(release phase). 

makes sound louder during the rise of the music 
(attack phase). 

tends to make sound fatter and exaggerate low frequencies 
(subject to time constants and threshold). 

tends to exaggerate t'ansients and high frequencies 
(subject to time constants and threshold). 

Attacks that are too short (fast) cause transients to be 
lost. 

Attacks as short as a few ms can restore and sharpen 
lost transicnts (e.g., from analog tape or ovcrcompresscd 
sources). 

‘ypical attacks 100 ms through 300 ms. Less than 100 tends 
to blurtransients. 

Typical attacks 1 ms through 300 ms. If a transient still 
sounds too sharp andtrying >150 ms attack, perhaps this is 
not the right process for this music, or consider a touch of 
limiting after the expansión. 

tends to make things sound djller or warmer. 

tends to make sounds brighter or sharper. 

If sounds "jump out" too much, raise the ratio, shorten the 
attack, and/or speed up the release. 

If sounds "jump out" too much, lower the ratio, lengthen 
the attack, and/or slow down the release. 

If attacks seem too Sharp, shorten the attack time. 

If attacks seem too sharp, lengthen the attack time, or 
consider compression. 

If sustains seem too long or too prominent, lengthen the 
release time. 

If sustains seem too siort, lengthen the release time. 

If attacks seem too dull, leng*.hen the attack tíme. 

If attacks need enhancement, shorten the attack time. 

If you don't like the percussiveness (e.g., snare drum), 
speed up the attack. To increase the ratio of rhythm to 
melody, lengthen the attack. Downward compression is nct 
good at helping the impact of percussion instruments. 

If you don't like the percussiveness (e.g., snare), slow down 
the attack. To increase the ratio of rhythm to melody, 
shorten (speed up) the attack. Upward expansión s very 
good at helping the impact of percussion instruments, 
however, sometimes at the expense of the vocal bolance 
because the percussicn becomes more prominent. 


can work very well with upward compression, which filis in 
any perceived low level "boles" or lost sustain. 

Very easy to degrade the liveliness or "bounce’' of the music 
if time constants are not optimized or if overused. 

Very easy to enhance the liveliness or "bounce” of the 
music, but watch out for too much "bounce” or 
exaggerated dynamics. 

tends to go against the natural movement of the music, 
especially when the parameters are not optimized. 

tends to worl« with the natural movement of the music, 
especially when the parameters have been optimized. 

tends to de-emphasize musical accents and emphasize ths 
sub accents and sustains in reverse proportion to their 
original movement. 

tends to emphasize the hottest musical accents and to a 
lesser degree, the sub accents in increased proportion to 
their original movement. 


Very useful to follow with a limiter, as loud passages are 
being brought up by the expander. As long as the Imiter is 
used to cheat down very short, momentary transients, it will 
not significantly diminish the effect of the upward 
expansión. The limiter's gain reduction meter should be 
moving very little and on brief occasions, while the 
expander's gain increase meter should be bouncing with the 
syllables of the music that's being enhanced. However, if 
the limiter's gain reduction meter starts to mirror the 
expander’s gain increase meter, then the two processes are 
canceling each other out and there's too much limiting. 

can decrease the overall dynamic range of the song 
Imacrodynamics), in addition to affecting the 
miuudynumic bolinee of tlie music. 

can increase the overall dynamic range of the song 
(macrodynamics), mckmg a climax seem even more 
climaclic, wliidi can Le vciy effective. 


III. Changing Microdynamics Manually 

It is possible to change musical microdynamics 
without using processors by doing manual edits and 
gain changes in a DAW. In this figure, I have artifi- 
cially enhanced thc attack of the first note ofa song 
with veiy brief manual upward expansión (it's the 
brevity which makes it microdynamic): 



At left, the first few milliseconds of the note 
have a greater gain (in this case, 3 dB), and then 
there is a crossfade to a gain of o dB, resulting in a 
sforzando. An interestingstory is that the producer 
was looking for a surprise when this track entered, 
and I initially had the beginning attack at +5 dB, hut 
when he took the reference CD home. he reported 
the attack was too startling, so I took it back a bit for 
the final master. 

This chapter completes our dynamics tnlogy. 


1 Which he was initially callingsúiechom compression. but I suggested a ñame 
change to avoid confusión wlth the sídechains of compressors. This tcchnique 
was publicized by Mike Bevelle in th= article Compressors and Iimilrrs. 
Studio Sound. October 1977 (also reprinted June 1988). Engincere have becn 
playing with parallei compression techniques for manyyears. 

2 This was the principie of the Dolby A/SR systems. which used a direct signal 
path summed with a comprcssed one. doing os little harm to the avdio as possible. 

3 Voltage controlled amplifier. In a consolé such as a Solid State Lcgic. all the 
audio in a channel passes through a VCA. The gating and comprensión are 
accomplished by summing sidechain informal ion and feeding it to the control 
voltage element of the VGA. It is triv.al to add upward expansión functions to 
any VCA type dynamics proccssor. 


Chapter 1 


i 38 














CHaPTer 12 

Noise 

Reduction 


I. Introduction 


Anthropologist Benjamín Whorf observed that 
the Eskimos have numerous words forsnoiü. 
Síniilarly, audio engineer.s discern a great nnmber 
of categories of what is collectively called noise. 
Laypersons generally do not distinguish distortion 
from noise but we find it useful: Distortion is a 
subset of the general categoiy we cali noise: it is a 
kind of noise that is correlated with the signal. 
Distortion can be low level and act much like what 
we normally cali noise. or it can be high level and 
quite obtrusive. lying on the peaks of the signal. 

N oise itself can be continuous or intermití ent, 
random or semi-random, colored (containing 
identifiable frequency components), impulsive, 
crackly, clicky, ticky (primarily high frequency), or 
poppy (primarily low frequency). Every kind of 
bothersome noise requires its own dedicated 
technical cure, but the most powerful cure is just to 
ignore the noise! Often we engineers tend to forget 
that the ear has a built-in noise-reduction 
mechanism whichgives us the ability to sepárate 
signal from noise, and hear information buried 
within the noise. 

Thus the key to 
good-sounding 
noise reduction is 
not to remove all 
the noise, but to 
accept a small 
improvement as a victoiy. Remember that louder 
signáis mask the noise, and also remember that the 
general public does not zero in on the noise as a 
problem. They’re paying attention to the music, and 



No single-ended noise reduction 
system is perfect; all noise reduction 
systems take away some degree 
of signal with the noise. ” 



MYTH: 

"I know that you 
can't hear anything 
but noise on this 
tape, but ¡fyou get 
riel of it all, 
you’ll be able to hear 
my husband having 
sex with his lover ," r 

I_I 


Chapter 12 


so should the engineer! So before consideringany 
noise reduction technique, we need to judge 
whether a noise is truly distracting. 

The noise-reduction methods described in this 
chapter are all single-ended as opposed to comple- 
raentary. The Dolby™ system is an example of a 
complementary, ortwo-step, noise-reduction 
system which applies one process during recording 
and an opposite process during playback. An 
important fact: no single-ended noise reduction 
system is perfect; all noise-reduction systems take 
away some degree of signal with the noise. Artifacts 
of overaggressive denoisinginelude: comb-filtering 
or phasingnoises, known semi-affectionately as 
space monkeys; and low levelthumps, pops. Overly 
aggressive noise reduction can also remove the 
critical ambience and atmosphere from a recording. 

The difficulty lies in the fact that 
reverberation tends to decay to noise. 

However, much of the directional 
information and ambience we perceive 
is from reverberation. Therefore, 
remove the reverb with the noise, and 
- in effect - you remove the walls, 
floor and ceiling from the room." 

Sonic Solutions No Noise 1 ” and Cedar De-Noise 
permit fine-tuning of the frequeney response of the 
noise-reduction curve, and a skilled engineerwill 
tailor that response curve for the best compromise 
between artifacts and perceived noise reduction. 
What distinguish.es a good noise reduction job from 
a bad one? — Good Taste. The engineer must 
eontinually relain peispeclivc, because the more 
noise removed, the more noise revealed (noise itself 


masks other noise below it)! It’s like peelingthe 
layers of an onion. If you remove some crackle from 
the right channel, suddenly you may hear some tics 
which were not previouslv audible in the left. In all 
cases, carcful comparison between the source and 
the processed product is necessaiy to ensure that 
the music has not been damaged. Ironically, the 
quieter the original recording, the more effective a 
noise reduction process can be. In other words, the 
more separated the original signal is from the noise, 
the more easily can the noise-reduction system 
diminish the noise without hurtingthe signal. So a 
real noisy recording probably cannot be íixed 
without creating artifacts. 

II. Noise reduction techniques 

Simple Filtering 

A passage with obtrusive hiss-like noise which 
contains no high-frequeney instruments can be 
treated with a simple high-frequeney equalizer. For 
example, an electric piano solo introducing a song 
may be hissy, but that noise will be masked when the 
rest of the instruments enter. This is a candidate for 
a selective filter; say 1 to 4 dB dip circa 3-5 kHz (this 
is the range where the ear is most sensitiveto hiss), 
active only during the piano introduction. However, 
even here the filter will affect harmonios of the 
piano, so we must make a judgment cali. 

P-pops are a type of signal-related noise, so they 
are a form of distortion, and since they are primarily 
low frequeney, can be treated with a selective high- 

* Gordon Reid of Cedar, in a convensation on ihe Mastering webbcard. 
t A myth from the reotorotion commuiiity 3uggested by Gordon Reid of Cedar. 

In truth. it’s nearly impossible to derive intelligible information from a tape 
if the voi:es are barely intelligible oraudible in the firat place. 


140 





pass filter, typically 100 Hz, but sometimes as high 
as 400 Hz. If the filter is applied briefly, the result 
can be artifact-free (invisible to the ear). In my 
DAW, I capture a short section with the filter, then, 
using the crossfade editor, narrow the extent of the 
filter to the p-pop¡ with practice the technique can 
be extremely fast. It is also possible to edit out just 
the offendingportion of a p-pop. 

Narrow-Band Expansión 

Compression techniques used in mixing and 
mastering (make-up gain, especially noticed during 
the release time) can bring up noise in original 
material such as tape hiss, preamp hiss, noisy guitar 
and synth ampiifiers, ail of which can either be 
perceived as problems or just "part of the sound.” 
Tkis is what makes our work so subjective. Since 
compression aggravated the noise, expanders are its 
cure. As little as 1 to 4. dB of reduction in a narrow 
band centered around 3-5 kHz can be very effective 
and if done right, invisible to the ear, performed 
with a multiband (downward) expander. Typically 
these units have 3 to 4 bands, but we will only use 
one. Start by finding a threshold, with initially a 
high expansión ratio, fast attack and release time. 
Zero in on a threshold that is just above the noise 
level. Youll hear ugly chatter and bouncing of the 
noise floor because the time constants are so fast. 
Now, reduce the ratio to veiy small, below i:2, 
perhaps even 1:1.1, and slow the release until there 
is little or no perceived modulation of the noise 
floor. Too much expansión, and you will hear 
artifacts such as pumping or ambience reduction. 
The attack will usually have to be much faster than 
the release so that fast crescendos will not be 


affected. Depending on the music, its dynamic 
characteristics and its original SNR, this subtle 
approach canyield artifact-free noise reduction. 
The other expander bands should be bypassed or 
ratios set to 1.1. A good expander will have look- 
ahead delay, which allows it to open before it’s hit by 
the signal, thereby conserving transients. If the 
expander approach does not work, then we will have 
to apply more sophisticated, dedicated noise- 
reduction processors. 

Complex Filtering 

Tonal noise can be diminished by using 
narrow-band selective filtering at the critical 
frequeney. Sonic Solutions No-Noise, developed 
by Dr. J. Andrew Moorer, has a complex filtering 
option that permits the insertion of many high- 
resolution narrow-band filters, suitable for 
removing hum and buzz (harmonios of the hum). 
Before inserting the filters, it’s useful to do an FFT 
analysis of the noise floor to determine which 
harmonios are present so as to apply only the filters 
that are needed. In SADiE’s 2496 or Artemis 
Systems, there is enough DSP power to insert many 
narrow-band filters in real time, and I have a 
dehumming preset with about 25 filters set for a Q of 
40 or higher. I’ve also found TC's Backdrop, 
developed by Dr. Gilbert Soulodre, to be veiy 
effective with tonal noise if I can find a sample of 
noise without signal. Systems like Backdrop, Cedar. 
and No-Noise must sample a brief piece of noise 
(even one second will do) in order to remove it 
without affecting the signal." Which brings up the 
point that you should not tightly cut the beginnings 
or edit material which is being sent in for noise 

* Cedar calis this the noise fingerpnnt. 


141 Noise Reduction 



reduction; the most likely candidate for a sample is a 
piece just before the dowrvbeat. 

Specialized Processors 

GML Labs has a specialized noise-reduction 
unit for hiss and continuous noise. Cedar has just 
produced a new miracle process called Retouch, 
currently available only for SADiE DAWs. Retouch is 
able to remove impulsive noises that no previous 
System could handle, such as a baby crying, chair 
squeaks, even people talking in the middle of a take. 
It is very expensive, but there is no substitute when 
you need it. 

Some manufacturers specialize in one kind of 
noise; some have sepárate (expensive) boxes to fix 
each of them. Eachtype of noise—scratch, crackle, 
hiss, buzz, rumble, thump, fitz, regular noise and 
irregular noise, high lcvcl and low lcvcl noise 
needs its own dedicated correction algorithm. A 
decrackler is really a múltiple -declicker, detecting 
and interpolating each moment of crackle, so it 
requires great DSP power. Sonic and Cedar have the 
most popular high-end noise-reduction Systems, 
with interesting entries from Algorithmix, 
Audiocube, TC Electronic and Waves. Sonic’s 
approach to continuous noise, such as hiss, or 
rumble, isto use 2048 individual contiguous filters, 
constituting a serious multiband expander. Artifacts 
are minimized since multiband processing avoids 
interaction between bands. Sequoia has an excellent 
FIR filter which allows you to visually and ergonom- 
ically pick each offending harmonic and reduce it. 
When the noise source is vaiying in frequeney. as 
from analog tapes with varyingspeed, a special ldnd 
of tracking filter is required. 


Chapter 13 142 


TC’s Backdrop is based un psychuacoustics and 
noise-masking, and is veiy effective on continuous 
or tonal noise such as hum, buzz, hiss and rumble, 
with minimal artifacts when properly adjusted. You 
get what you pay for, and the critical ear can tell the 
quality difference between the most expensive and 
cheapest systems. 

III. One Man’s Meat 

Is Another Man’s Poison 

I once mastered a punk rock álbum where the 
opening of onc tune had an obvious clcctrical tic on 
top of the bass player's note. I removed the tic and 
the note was restored to its beauty—I thought. But 
then I heard from the producer that he missed the 
tic and so I had to put it back. Thus proving that 
beauty is in the earof the behearer , and many noises 
are considered to be part of the music. Get to know 
each musical form (especially punk rock) and in 
some cases think about leaving it dirty instead of 
clean! 

IV. Manual Declicking, Dethumping, 
De-Distortioning, Depopping.... 

Agood mastering system should have integrated 
manual denoising, which allows us to quickly and 
selectively clean up momentaiy noises. Declicking, 
dethumping, de-distortioning, depopping, and 
other techniques are critical mastering system 
features. The next figure, part A shows a thunk from 
an LP record. The left channel of this figure (top 
panel) has already been dethunked, as can be seen by 
the horizontal marker above the left channel 
waveform. When reproduced, the slight DC level 


shift that remains does not transíate to an audible 
noise. The right channel contains a severe thunk 
manifested by an instantaneous upward, then 
downward DG level shift (which causes woofers to 
rattle). With Sonic Solutions' manual declicking. the 
correction process is as simple as marking the noise 
with the gates and selecting D Type from the menú. 


D Type is a powerful interpolator which can stitch 
together "impossible” waveforms and even remove 
brief dropouts or holes with no audible effects. 

In figure part B, the low frequency thunk and 
most of the DC discontinuity have been repaired, 
and the ramped DC level shift that remains 

(probably record warp) does not 
produce an audible noise. 

LP records are not the only 
sources that need declicking. 
Something as simple as an 
obtrusive vocal "lip smack” can 
be cleanly and quickly excised, 
and brief overload distortion can 
also be cleaned up by the interpo- 
lation technique. Sonic’s E-type 
decrackler can also selectively 
reduce sibilance. 1 use it instead 
of an overall sibilance controller 
when there are only a small 
number of ofíending s’s in the 
recording. E-type can also reduce 
and sometimos elimínate the 
harsh sound quality of clipping 
and digital overs. 



A:LP Thunk in the Right Chanrel (different panel heights reflect different 
usual magnifications, not different amplitudes). The left channel has already 
be;n denolsed (red bar). 



B: Ifter manual declicking, the thunk is removed. 


In the figure on the next 
page, on top, a severe click is 
marked manually by the gates, 
and on the botlom it has been 
removed. Note that Sonic 
Solutions' automatic vertical gain 
conveniently amplifies the 
display to the highest amplitude 


1 4-3 Noise Reduction 















































0.1- 

0.05 • 

■3 

1 

e- _ 

0- 

-0.05 

-0.1- 




SgÚfllioliVlia'.” . lo:2Q:S3;23 




Manual declicking is extremely labor-intensive 
but very rewarding; it’s like hiring a meticulous 
gardener to remove each weed in your garden by 
hand. instead of using harmful Chemicals. 



Click ¡n the top panel has been removed in the bottom (marked by the red bar). 


in the view, which is no longer the click after it has 
been removed. 

Here’s another remarkable before/after 
example (witb a modern G/J, computar, the repair 
takes about 3-5 seconds). 




On top, a click is surrounded by the gates. At bottom, after choosing D-Type from 
the No-noise menú, the click is removed. 


Cliapter 12 144 








































CHaPTer 13 1. Introduction 


Other 

Processing 


In this chapter we'll discuss important 
techniques such as how to determine proper 
polarity and inter-channel balance. In addition 
we’ll introduce specialized processing including MS 
Equalization or MS compression... and the world of 
mastering processors including reverberation, 
ambience extraction, "replicators," exciters, etc. 


II. The Balancíng Act 

First Check the Monitor Balance 

Adjusting inter-channel balance seems like a 
simple procedure, but many people have miscon- 
ceptions about how to achieve correct stereo 
balance. Before makingany judgments of program 
channel balance, first verify thatyour stereo 
monitors themselves are balanced. Play a mono pink 
noise signal at equal level to both stereo speakers 
and confirm the pink noise image is tíghtly centered 
between the speakers at all frequencies of the pink 
noise. Ride the monitor level control up and down 
within the normal ranges and confirm that the 
image of the pink noise remains centered. If it’s not 
tightly centered, then suspect the crossovers, 
drivers, level control, preamplifier channel balance 
or room acoustics. Chapter 14 covers the monitor 
calibration process in more detail. 


Polarity is "direction,” positive- or negative- 
going for an electrical signal, outward or inward for 
a transducer and the recommended standard is that 
positive voltage means positive pressure. If there’s 
an audible "hole” between the left and right 
loudspeakers (especially obvious at low 


'45 


frequencies), then one loudspeaker is moving 
inward while another is moving outward, henee the 
two wavefronts are canceling acoustically to some 
degree. This is defined as incorrect relative 
polarity, caused by improper wiring. Many ofus still 
use the antiquated phrase "the speakers are out of 
phase,” but we really mean they’re "out of polarity 
with each other” (because phase really means time). 
In a 2-channel reproduction system, incorrect 
relative polarity yields a hollow sound, imaging way 
to the sides and not in the middle, with reduced bass 
and lower midrange response. The solution is to 
search each balanced Une and speaker connection 
for the pair of wires which are reversed. 


"Never Use The Meters to Make 
Channel Balance Judgments ” 


Stereo Balance of the Program Material 

Music feels mueh better when the balance is 
"locked in.” When makingstereo balance 
judgments on program material, I consider left- 
right channel balance errors of >0.2 dB to be 

significant, but tiy 
to keep balance 
errors to <0.1 dB. It 
is difficult to use 
meters to judge 
channel balance 
because at any 
moment in time, one channel will likely be higher 
than the other. I've seen songs where one channel’s 
meter (peak or VU) is consistently a dB or so higher 
than the other, but the balance is exactly correct. 
This is because some high-frequeney-dominant 
instruments project better than fíat meters indicate; 
for example, with a mandolin on the right and viola 
on the left, proper balance will likely occur with the 


left meter reading higher, and it also depends on 
who’s doing the lead part! If in doubt, change the 
balance 0.1 dB at a time until it sounds just right. 

A stereo position indicator (see Figure CI6-01 in 
the Color Plates) may help, but most times it just 
tends to coníirm whatyou’ve already heard. Judge 
balance by ear, and when in doubt, check with the 
producer, since the lead vocal is sometimes 
intentionally placed off-center. Other times, even if 
the lead vocal is supposed to be centered. this may 
not produce the best balance between two 
accompanying instruments located left and right, or 
you may feel that the instruments on one side are 
competing with the vocal’s intelligibilitv. In that 
case you have to thinklike a mix engineer, so it pays 
to check the producers intentions. Sometimes the 
producer will say, "oh. we didn’t get that mix quite 
right, it’s possible the violins on the left need to 
come up against the trumpets, use your judgment.” 
But if it takes more than about a dB of balance 
adjustment to fix the problem, a remix may be in 
order or the sound image may end up lopsided, and 
it bears repeating—check with the producer. 

Fixing Relative Polarity 

The so-called phase switches on consoles do not 
change time, they inveit the polarity. If two sources 
are 180 o out of phase at all frequencies (or a large 
band of frequencies), then we conclude they are out 
of polarity with each other. and we must correct the 
polarity of one channel. If the correlation meter 
(see Figure ció-01 in the Color Plates) shows a large 
phase difference approaching 180 o , check for 
interchannel (relative) polarity errors by switching 


Chapter i 3 146 


the monitor to mono and inverting one channel’s 
polarity. The position that gives the most bass is the 
corred one. Sometimes this is the only method to 
verify the corred polarity when two spaced omnidi- 
rectional microphones were used. since there is a 
lot of random phase information in such a 
recording. When several mikes are mixed together, 
if only one pair is out of relative polarity, there’s 
little or nothing we can do about it in the mastenng. 
For example, if the percussion drops out in mono 
but the vocal remains fine, there’s nothingyou can 
do short of a remix. 

Fixing Absolute polarity 

By convention, absolute polarity is correct when 
the loudspeaker moves outward (toward the 
listener) with a positive-goingpulse. First, check 
the absolute polarity of your reproduction system, 
with a polarity tester and polarity test signal. If you 
do not have a polarity tester, play a Telare orchestral 
recording and confirmyour woofers move outward 
on the attack of the big bass drurn. 

It is debatable whether the human hearing 
mechanism can detect absolute polarity. If both 
speakers are moving inward when they should be 
moving outward, can you hear the difference? Many 
listeners claim to be sensitive to absolute polarity 
reversáis, but scientists have shown that this may 
only be due to a non-linearity in the loudspeaker 
driverormagnetstructure. Nevertheless, I 
produced an absolute polarity test for Chesky 
Records, using a solo trumpet recorded in a natural 
space with a Blumlein microphone pair. When the 
polarity is incorrect, the trumpet appears (to most 

• Revcrsing wires on pins 2/3 of an AES/EBU cable dora nol alTcct the audio in 
any way. Polarity revcrsal can be accomplished ir. the analog domam. or with a 
digital processor. 


listeners) about a meter further back. This is 
evidence that incorrect absolute polarity can affect 
how we mix and master. 

As a digital mastering engineer, I tiy to look for 
evidence in the DAW waveform that the polarity is 
correct. Most instruments produce waveforms with 
ambiguous polarity, but major bass drum attacks 
should be positive-going, and a solo trumpet on a 
held note produces a distinct, positive-going 
waveform. Sampled bass drum tracks have often 
been so mangled that you cannot tell the polarity 
from the waveform. Other than this direct evidence, 
all you can do is experiment with both polarities to 
see which sounds better. Of course, make sure both 
channels’ polarity are changed.' 

Fixing Phase shifts and Azimuth Error 

Modern-day digital consoles also have Controls 
to manipúlate timing. A small timing error between 
two sources is a phase error, which can cause comb 
filtering especially if combined to one channel. If 
the two sources are 180 o out of phase at only a few 
frequencies, then they are out of timing (phase 
shift), not out of polarity. 

The procedure for correcting small 
interchannel phase shifts requires a keen and 
experienced ear. You must have a timing control 
calibratcd in samples. Switchthe monitor to mono, 
increase the delay on both channels equally, by 
about 5 samples. Then increase and decrease the 
relative timing of one channel a sample at a time. 
Use the timing control like the focus on a camera, 
with the goal being greatest high frequeney response 
and mínimum comb filtering at the center of focus. 


'47 


Other Processing 



This procedure also can be used to align spot 
microphones with main mikes, (as described in 
Chapter 17) and it’s how we adjust analog azimuth if 
there are no tones on the tape. Single-sample 
increments are very coarse at 44.1 kHz SR, which is 
why Cedar has invented the digital azimuth 
corrector, which has sub-sample timing 
increments, accurate to 1% of a sample. 

DC Offset Removal 

Sometimes poorly-calibrated A/D converters 
add a DC offset, where the centerline of the 
waveform at rest is not exactly o volts. Also, some 
poorly-implemented DSP processes add DC offset. 
When DC offset is excessive, headroom is reduced 
in the direction of the offset, in other words; raising 
gain would cause the audio to clip prematurely 
because the centerline is offset. But when using 
digital limiters, slight loss of headroom due to DC 

offset is not a 
problem. DC offset 
reveáis itself on a 
digital meter as a 
static low level 
signal, but this 
could be noise, not 
DC; with DC offset, the waveform in the EDL during 
a quiet passage will appear offset from center. But 
the best way to determine if there is a problem is to 
repeatedly play and stop the material. If you hear a 
meaningful click or a pop when starting or stopping, 
the DC offset should be repaired. Prior to the advent 
of high-resolution digital equalizers, I preferred not 
to fix DC offset, but now the easiest solution is a veiy 
steep high-pass filter, below, say, 30 Hz. 


"A pitch corrector that sounds 
transparent and maintains the 
original timing—does not yet 
exist. ” 


Pitch and Time Correction 

It’s impossible to fix the pitch of a vocalist when 
he’s mixed with other instruments that are on pitch, 
so mastering engineers are not often called upon to 
correct pitch. However, when a soloist is playinga 
capella. we’ve been asked to make corrections. The 
simplest and cleanest fonn of pitch correction is 
one where both the length (timing) and the pitch 
of the material are altered, exactly like playing an 
analog tape recorder faster or slower. This is done 
by a sample rate conversión, and then reinserting 
the material of the "wrong” sample rate into the 
EDL—this technique can sound excellent if a good 
SRC is used. But sometimes we're called upon to 
ehange the speed of an entirc song without changing 
the pitch, or the pitch without changing the speed. 
which are big challenges. I have never done it 
without creating an audible degradation in the 
sound; at worst the splicing in these algorithms 
yields a gurgling or wavering sound quality, and at 
best there is a fidelity reduction," so we always 
prefer to use the simpler SRC method if 
permissible. As DSP has gotten more sophisticated, 
pitch and time corrector» have hecome much better, 
and I have gotten awav with using one for short 
pcriods; but I have not yet hcard a transparent one 
and some degradation can be heard in a high- 
resolution environment. 


• A papular song by Chcr. ~Belicve.” takes advantage of the veaknesses of 
such devices. 


Chapter i 3 148 



III. "Remixing” at the 
Mastering Session 

Vocal Up and Vocal Oown Mixes 

The mixing session is oftenhectic and it’s a good 
idea to hedge your bets by printing altérnate mixes, 
e.g., "vocal up," and "vocal down" (by about V2 to 3 A 
dB). Later, inthe pristine acoustics of the mastering 
environment we can choose the best mix, that which 
works best in the context of mastering processing. 

Mastering from Multitrack Stems 

A client brought a DAT with 10 songs. On one of 
the songs, the bass was not mixed loudly enough 
(this can happen to even the best producer). We 
were able to bring up the bass with a narrow-band 
equalizer that had little effect on the vocal, but when 
the producer took the ref home, he was dissatisfied. 
In his view the advantages of the increased bass 
were offset by the effect it had on the delicacy of the 
vocal. He asked if he could bring me a DAT of just 
the bass part so that it could be raised in mastering. 

I asked for a DAT with a full mix reference on 
one channel for synchronization purposes, and the 
isolated bass on the other. I was able to load the DAT 
into my workstation, synchronize the isolated bass, 
and raise the bass instrument in the mastering 
environment, without affectingthe vocal. It wasan 
unequivocal success. This is an example of an 
unsynchronized stem, and since the bass is also 
present in the full mix, there is danger of phase 
cancellation between the full mix and the added bass 
track if they are not perfectly synchronized. 1 do not 
recommend this practice; instead, all stems should 
be sample-accurate synchronized, begin at the same 


timestamp, and ideally, each stem should have 
unique elements.* 

Another client doing the álbum of a pianist with 
orchestra brought a four-track Exabyte archive in 
Sonic Solutions format, with the piano isolated on 
two tracks. In the mastering we could adjust or 
equalize the solo piano separately. 

When a stereo mix is done to múltiple stems, 
there are typically six tracks (3 pairs), each with its 
own reverb: vocal, rhythm, and melody 
instruments. Mastering engineer Bob Olhsson has 
pointed out that surround mixing demands the stem 
approach, because clients certainly are not going to 
make múltiple "vocal up" 6-channel surround 
mixes. Instead, mastering will become an extensión 
of the mix environment. Producers will send 24- 
track tapes with stems divided into múltiple 5.1 
groups, such as vocals, bass, rhythm, etc., which if 
reproduced at unity gain, represent the mix as the 
producer put it down in the control room. 

MS Mastering 

Mastering engineers are always seeking ways of 
repairing or enhancing one element of a recording 
without detriment to any other. There are always 
tradeoffs, but judicious use of MS tools can be 
lifesavers, turning a good recording into a great one, 
or saving a so-so recording from the dust-heap. 
(Nothing can repairbad musicianship. and autotune 
doesn’t work on mixed material). 

A client had mixed in a bass-light room and his 
bass was very boomy, right up to about 180 Hz. At 
first the vocal carne down slightly when I corrected 

* Films are always mixed lo stems. e.gdialog. music. effecls. 


149 Other Processing 



the boomy bass, but through MS processing 
techniques, I was able to produce a perfectly- 
balanced master. MS stands for Mid/Side, or 
Mono/Stereo. In MS microphone technique, a 
cardioid. front-facing microphone is fed to the M, 
or mono channel, and a figure 8, side-facing 
microphone is fed to the S, or stereo channel. A 
simple decoder (just an audio mixer) combines 
these two channels to produce L(eft) and R(ight) 
outputs. Here’s the decoder formula: M plus S 
equals L, M minus S equals R.' Here’s how to decode 
in the mixer-. feed M to fader i, S to fader 2, pan both 
to the left. Feed M to fader 3 , S to fader 4, invert the 
polarity of fader 4 ("minus S”), pan both to the 
right. Start with all faders at unity gain, and change 
the M/S ratio to taste. With more M in the mix, it 
becomes more monophonic (centered): with more 
S, the more wide-spread, diffuse, or vague the 
sound becomes. If you mute the M channel, you will 
hear a hole in the middle, containing largely the 
reverberation and the instruments at the extreme 
sides. Mute the S channel, and you will largely hcar 
the vocalist; the sound collapses, missing richness 
and space. There’s little separation between M and S 
channels, but enough to accomplish a lot of control 
on a simple 2-track. It’s great for film work—the 
apparent distance and position of an actor can be 
changed by simple manipulation of two faders. 

The MS technique doesn’t have to be reserved 
to a miking technique. We can sepárate an ordinaiy 
stereo recording into its center and side elements, 
and then separately process those elements. I tell 
my clients I’m making three tracks from two. For 
example, let’s take a stereo recording with a weak. 


center-channel vocalist. First we feed it through our 
MS encoder, which separates the signal into M and S 
and we decrease the S level or increase the M level. 
Listeningat the output of the MS decoder, presto, 
the vocal level comes up, as does the bass (usually) 
and eveiy other centered instrument. In addition. 
the stereo width narrows, which often isn’t 
desirable. But at least we raised the vocalist and 
saved the day! Similarly, I’ve used MS to fix the ratio 
between a center-located lead vocalist and side- 
located background singers, even vaiyingthe MS 
ratio between verse and choras of the song. Some 
processors have built-in width Controls; what they 
do is internally convert to MS format, adjust the M/S 
ratio, and then reconvert to LR format. The Waves Si 
plug-in processor's width control is gain- 
compensated, so the apparent total level is held 
constant as the width is changed. You can 
accomplish the same thing by lowering the S as you 
raise the M, or vice-versa. 

Automatingthe MS correction. When vocal (or 
center instrument level) has to be selectively 
tweaked, either the plug-in can be automated, orwe 
can correct the problem directly in an EDL without 
usingany processor. To raise the (centered) vocal, 
add a duplícate of the material in another stream, 
with the channels reversed. Add this in at as low a 
level as tolerable (typically —12 to —16 dB), for if 
taken to an extreme it will tura the entire material to 
monophonic. I may add a tetch of K-Stereo 
processing (described later) to compénsate for any 
loss of ambience, width or sense of space, and lower 
the bass gain to reduce center-channel bass build- 
up. By contrast, in places where the center vocal 


Cliapter i 3 >50 


sticks out too much, suhtract a duplícate of the 
material in another stream, with the channels 
reversed. In other words, add in a reversed- 
polarity. reversed-channel duplícate of the souree 
material. A crossfade into and out of the material in 
the extra stream is the automation that raises or 
lowers the level of the center-channel material. 
Another way to automate this process is to add an 
MS encode-decode plug-in to the mixer, and 
automate the panning between the M and S 
channels on the encode side. 

MS EQ. We can accomplish a lot by manipu- 
latingthe M and S signáis with equalizatíon. Let’s 
take our stereo recording with weak centered 
vocalist, encode it into MS, and apply sepárate 
equalizatíon to the M and S channels. Since the M 
channel has most of the vocal, we can raise the vocal 
slightly by raising (for example) the 250 Hz range, 
and perhaps also the presence range (5 kHz, for 
example) in just the M channel. This brings up the 
center vocal with little effect on the other 
instruments, and doesn’t affect the stereo 
separation as much as if we had raised the M/S ratio 
of the entire spectrum. 

The Weiss EQ-i has an optional MS 
encode/decode which can be placed around the 
equalizer section. Raising or lowering the EQ onone 
channel of the equalizer affects the stereo 
separation. Spread the cymbals without losingthe 


focus of the snare, tighten the bass image without 
losing stereo separation of other instruments, and 
so on. The TC Electronic Finalizer qóK’s spectral 
stereo imager is essentially an MS equalizer ”on its 
side¡” it’s an MS width control divided into 
frequency bands. See Finalizer image below. 

MS Compression. Consider a mix that sounds 
great, hut the vocal is sometimes slightly buried 
when the instruments get loud. If we tiy 
compressing the overall mix, or even narrow band 
compression of the vocal frequency range, we might 
be disappointed that the compressor action ruins 
the great sound of the instruments. MS 
compression can help us isolate the compression to 
the center or M channel—by only compressing the 
M channel, we delicately bringup the center when 
signáis get loud. 2 Or compress the M channel and 
expand the S, which helps control the vocalist and 
open up the band! 3 Or, by doing multiband MS 
compression, we could keep the bass instrument 
from being affected by our vocal range compression. 
In other instances, we might achieve that special 
kick drum sound by compressing only the low 
frequencies of only the M channel. The possibilities 
are solely limited by our imaginations. 

Patching Order of Processes 

Sometimes it’s betterto compress before 
equalizing. For example, if the EQ is being used to 
enhance the level of some instrument (e.g. if we’re 


The TC Electronic Finalizer 96K is an all-in-one Miste ring Processor. 



151 Other Processing 









The Cranesong STC-8 ¡s a high 

qtality stprpn nnnlng compresor 
ccmbined with a peak limiter. 


looking for a punchy or thumpy bottom), a 
compressor after the EQ might undo the effect of 
the equalizer by pushing the strongest sound 
downward. 90% of the time my equalizer is patched 
before the compressor; as I make changes in the EQ, 
I alter the compressor’s threshold to retain the same 
action. I almost always put sibilance controllers 
early in the chain, so they will opérate with a 
constant threshold (sensitivity) regardless of how 
other devices are adjusted. 

IV. An Eclectic Collection 
of Mastering Processors 

Here is a brief (alphabetical) collection of 
processors used for masteringat major studios 
worldwide. Please do not draw conclusions about 
the inclusión or exclusión of a particular unit in this 
set; it represents Ítems that either 1 have used or 
which have gained a strong reputation among other 
mastering engineers whose ears I trust. Some 
additional popular units are described in Chapter 16. 

Plug-ins vs. Stand-Alone Processors 

Currently, Sonic Solutions uses proprietary 
plug-ín formats to preserve the highest sound 
quality, so we must feed an external program that 


can run plug-ins as an effects loop. Sadie V. 5 has a 
proprietary plug- in format but also accepts Direct- 
X. Ergonomicallyspeaking, plug-ins are a mixed 
bag. It’s much easier to opérate a stand-alone box 
with real knobs than a plug-in with a mouse, but 
thereare also stand-alone processors whose user 
interface leaves a lot to be desired. And some plug- 
ins feature a user interface which is so ergonomic 
that it’s a lot easier to adjust the parameters of 
múltiple channels simultaneously than with any 
standalone box. Sonically speaking, plug-ins have 
improved tremendously in the past fewyears, 
particular those Native Plug-ins employing 64-bit 
floating point architecture (see Chapter 16). At this 
point, the sound quality of a processor isup to its 
designer more than whether the process is a plug- in 
or an outboard box. However, pressure to reduce 
CPU demand often results in Plug-ins with 
compromised sound quality. 

Classic (and near-Classic) Analog and Digital 
Processors 

The Cranesong STC-8 (image below) is a high 
quality stereo analog compressor combined with a 
peak limiter, and is gaininga reputation amongst 
mastering engineers. The STC-8’s compressor’s 
attack and release times are optimized for mastering 



Chapter i 3 152 











purposes, and it is capable of both eniulating vintage 
equipment and creating distinctive new sounds. 

The DBX Quantum II is a powerful multi- 
function digital processor with up to 96 kHz 
operation. All DSP is calculated in 48-bit fixed- 
point notation, accurately dithered to 24 bits on its 
output for low-distortion sound. It has multiband 
and M/S options as well as parametric EQ, 
compression, expansión and limiting. One of the 
rare dynamics processors which inelude ratios 
below 1 (see Chapter 11), it’s particularly valuable 
ioTuncompression. However, I have trouble adjusting 
to DBX’s approach of naming releasetime in dB/sec; 
I just turn the knob and go by my ears. Since all the 


characterized by atoo-bright, edgy, fatiguingsound. 
I advise mix engineers to avoid using exciters on the 
mix bus until mastering in a more controlled 
acoustic environment (though modérate use of 
exciters on individual instruments can help a mix). 
However, the Cranesong II EDD i92 (pictured 
below) is a digital processor that has almost no 
digititis and thus is in a class by itself. It uses natural 
distortion patterns derived from elassie analog gear 
(see Chapter 16). Other digital exciters inelude the 
SPL Machine Head and Steinberg Magneto, which 
are digital processors, the latter beinga plug-in. 
Analog exciters inelude the Aphex and BBE. A 
nurnber of multifunction boxes contain exciter 




functions are crammed on one LCD screen with 
múltiple menú levels, ergonomics can be daunting. 
This is the case with many such multi-function 
units; examine and test the menú structure before 
you buy—in the best units, critical functions will be 
no more than one or two menú levels below the top. 4 

Exciters 

An exciter is a distortion generator. The use of 

Exciters can often lead to unmusical sonic results 


modules, including the TC Electronic Finalizer 96 
and Drawmer DC 2476 mastering processor, 
another multifunction processor. 

The Fairchild tube limiter and Pultec equalizers 
have not been constructed since the 1960’s, but have 
attained such legendaiy status for their fat sound 
that I am obliged to mention these unobtainables en 
passant. There may be some modern-day 
substitutos which do as well or perhaps better, with 
cleaner, quieterelectronics. Ifyou’relookingforthe 



Top: The DBX Quantum II processor 
is a multi-funct'on unit with up to 
96 kHz operation. 


Bottom: Cranesong HEDD-192 
Analog Simulatcr. 


’ 5 3 


Otlier Processing 


















Digital Domain Model DD-2 
K-Stereo Ambience Recovery 
Processor 


Pultec or Fairchild sound orbeyond, considerunits enhance the depth, ambience, space and dei'inition 

from Cranesong, Manley, or Millennia. in stereo mixes that otherwise would sound small. 

K-Stereo extracts existing ambience, givingthe 


mío u m*om q 

/, • . * ■ ■ 




O» O O». 

;y' ‘'t, ;y u 'v‘, ;y 


- irv*i ♦ 

, :■*<• o «, 

,'X" 

-• O Z-n 


0»3 nt , . OlOo ». c : 

;y;V' ;v 


,«»<»» i. 

yy""-';. 


*“»» i *" *“a* a»" 

. « v ■* «' 


«O 14^* 14^ ^,«1 


vi» M . y*U M . v U 

i» 4 .• .t /. s'» nV. , , \ •» 

".. . 4 r • •„ 


Hi>ivi nmoo «** 

r^w 5 y,. 

^ ^..ÍJ *'0 "O ',U ’4^ 




C*»* *•> 

i. -, a» \ 4 M / 


Massenburg Equelizer Model GML-9500 


George Massenburg is the design engineer for 
GML and the inventor of the very concept of 
parametric equalization. The model 9500 mastering 
equalizer (pictured above) isthe mastering versión 
of the popular 8200 analog parametric, which has 
been an industry standard and popular with 
mastering engineers for over 20 years. GML also 
manufacture an analog dynamic range controller 
and a digital noise reductionunit. 


K-Stereo. DSP permits us to accomplish tricks 
which were not possible in analog. I invented the 
K-Stereo and K-Surround processesto 


mastering engineer a handle on reverb returns after 
the mix has been made. It should be the first 
enhancement choice before trying a reverberator, 
because overall reverberation can muddy an existing 
mix. whereas K-Stereo selectively enhances 
elements in a mix which already contain ambience. 
For examplc, if a mix has a wct vocal that nccds 
enhancement but also has a dry snare drum, K- 
Stereo will affect the vocal reverb but not the snare 
drum. ltdoesthisusinga 
psychoaeoustically-based 
process that’s subject of a 
patent application. Digital 
Domain manufactures the 
Model DD-2 K-Stereo 
Processor (pictured at left); 
Z-Systems has licensed the K- 
Surround process in the model Z-K6, a 2 channel to 
2-channel converter, and Weiss Engineennghas 
licensed K-Stereo for a inultifunction unit. 



Chapter i 3 154, 









Top: The Waves 12 Uítramaximizer. 




Middle: Manley Nassive Passive 
Stereo Equaliier. 


Bottom: Manley Stereo Variable 
MU Limiter Compressor. 


The I.3 is the first hardware product produced 
by Waves and has become an obligatory mastering 
limiter (above top). This device helped spawn the 
narrow-minded philosophy "I can make anything 
louder thanyou can.” However, an exceptional auto- 
release and 48-bit processing make the L2 the least 
damaging limiter I've encountered. Yes. this is a 
left-handed compliment, but the b¿ can sound puré 
and transparent at low gain-reduction settings. It 
also contains Waves’ IDR dither, which is among the 
better-sounding 16-bit dithers I have encountered, 
and an excellent 24 - bit A/D converter. 


I found the Manley Massive Passive Equalizer 
(pictured middle) to be remarkably transparent and 
quiet for a tube equalizer. It gains its ñame by 
employing a passive equalizer section followed by a 
quiet, high-gain tube amplifier. To my ears it has 
just the rightamount of tube distortionyet retains 
clarity without being too "fat.” It also has far more 
versatility than the apparent four bands-per- 
channel because the Q or shape control affects the 
shelving curve as well as the bell, giving the effect of 
a 7 or 8 band equalizer. It's well worth downloading 


'55 


Otlier Processing 



























the informative and humorous manual written by 
Manley's versatile Craig "Hutch" Hutchinson. 

A mastering house should have a variety of 
compressors to choose from, since no two sound 
alike, even with similar attack and release settings. 
Several outstanding mastering engineers report that 
the Manley tube Vari-Mu Compressor (bottom 
image, previous page) can help provide desirable 
punch and fatness with modern rhythmic music and 
is a good replacement for the classic Fairchild. 
which also employed variable Mu techniques (Mu is 
tube shorthand forgain). Distortion can be varied 
from very low to screaming by changing the 
input/output gain ratio. 



MaxxBass, a Plug-ln from llaves. 

MaxxBass. Mixing is a tough job. One problem 
we sometimes encounter is a bass instrument with 
inadequate definition or unclear notes. Obviously 
the best solution is to turn around and remix with 
better EQ or compression on the bass, but that’s not 


always possible. Waves' plug-in called MaxxBass 
(pictured at left) is designed to help clariiy the 
definition of the bass instrument with minimal 
effect on the rest of the mix. It’s a form of a 
dedicated exciter and a very powerful process that’s 
easy to overuse and dangerous to employ without 
high resolution monitoring. 

This is not the fault of the processor, but a 
limitation of working on any mixed material, since 
it cannot distinguish the bass instrument from the 
toms or the bass drum and if overused, the result 
can be thin-sounding. Essentially the process works 
bylow-pass filtering the source, synthesizing 
harmonics and then mixing them back into the full 
mix. Don’t try this with a standard exciter, because 
another key to MaxxBass is that it retimes the 
harmonics with the main signal, which is not easy to 
accomplish using external boxes. 

Another use of MaxxBass is to give an 
impression of bass response for small systems, by 
taking advantage of a psychoacoustic property of the 
ear that supplies missing fundamentáis when the 
harmonics are only present. Watch an oíd movie on 
televisión and you may not notice that the dialogue 
has been sharply high-pass íiltered below about 200 
Hz. If using MaxxBass for this purpose, be aware 
that the sound is tailored for a particular small 
system and will not transíate to every other. In fact, 
the tailored product can sound embarrassingly ugly 
if reproduced on a full-range system. 

Millennia Media manufactures a Twin Topology 
line which can be either tube or solid State at the flip 
of a switch. The JNSEQ-2 equalizer (pictured) 


Chapter i 3 156 




















Milleiwia Media MSEQ-2 Tui>e and Solid State Analog Squalizer. 


probably has the shortest internal signal path of anv 
analog equalizer, with a single DC-coupled solid 
State or tube opamp performing the duties of input 
eomlitioning, equalization. andline driving. In 
common with many top-of-the- Une analog 
processors, headroom is exceptional, clippingat + 3 ? 
dBu (solid State) and in solid State 
mode it is as cióse to an 
analog straight wire with 
equalization as I have ever 
heard (see Chapter 16). 

Measurement Devices and 
Interfaces 

The Metric Halo Mobile 
1/0 (pictured above right) is a 
portable high-resolution 
recording studio, and in 
conjunction with SpectraFoo, it 
serves as a multi-channel Firewire 
interface, portable jitter and spectrum 
analyser for digital and analog audio problems. 
Attached to a Titanium G4 Powerbook, it’s a highly 
functional portable measurement and analysis 
system. The jitter and distortion analyses in this 
book were made with the MIO and SpectraFoo. 

Another useful portable measurement and 
setup device is the Audio Toolbox by Terrasonde. 
Complete with measurement mierophone, it can be 


used to align a monitor system or 
simply to send test tones to 
external devices. 

Reverberation Processors—How 

Real Can you Get? 

A small percentage of the work that comes in for 
masteringrequires added reverberation. Some 
clients have purposely mixed dry because they did 
not have access to the quality of reverberation that 
we have at the mastering house; but the music must 
be of a nature that will not suffer if reverberation is 
added to every element. Most mastering requires a 
very natural-sounding reverberator, unless we’re 
looking for a brief special effect. My requirements 
for a natural-sounding reverberator inelude 
excellent simulation of the early reflections that 
would be present in a real room (see Chapter 17); if 
soloing the early reflections, they should 
^ sound natural and be able to stand on 

their own. In 1994 I produced a unique 
audiophiletestCD, CheskyJDm, 
containing a dry- versus-wet test that 
you can use to evalúate the sound of a 
reverberator. I placed a drum set on the stage at 
BMG studio A, in front of a single Blumlein 
mierophone pair. The figure-8 mierophone pattern 
has equal pickup front and rear, so it captures the 
reverb coming írom the hall in stereophonic 
perspective. But first I closed the thick stage 
curtains, isolatingthe drumsto the small stage area, 
and recorded a very dry-sounding one-minute 
drum solo (track 35 on the test CD). Then I opened 
the curtains, and recorded the solo once again with 
the identical mike, whose rear side picked up the 



<57 Other Processing 









reverb from the 6o x 40 foot, 2-story high diffuse- 
treated room (track 26). Compare the sound of the 
real room against any simulator. 



TC Electronic Icón remóte. Visible on its screen the equalization capabilities 
ofone ofits four 96 kHz/48 bit 8-channel engines. 


Sibilance Controllers (De-Essers) 

Sibilante (exaggerated' s’ sounds) is a natural 
artifact of compressors as well as bright micro - 
phones and certain mouth and teeth shapes. A 
standard compressor exaggerates sibilance because 
the compressor doesn’t correspond with the 
frequency response of the ear; the sibilant is in the 
ear’s most sensitive frequency range, but typical s 
sounds fall below the 
compressor threshold. The 
solution is to employ a very 
fast, narrowband compressor 
working only in the sibilance 
región (anywhere from 2.5 
kHz to as high as 9 kHz in 
some cases). A standard 
compressor can be adapted 
to a sibilance controller by 
equalizing the sidechain, or 
by using one band of a 
multiband compressor. 
Nearlv every multi-function 
1 processor orplug-in 
manufacturer has a 
sibilance controller option, 
but it’s not an easy process 
to get right. Listen for 
artifacts such as distortion 
or pumping, or ineffective 
reduction of the s’s. Tve 
found the best-sounding 


sibilance control in dedicated units such as the 
digital Weiss DS1-MK2, whose attack, releaseand 
filtering characteristics are idealized for processing 
premixed material with little or no artifacts. Several 
mastering engineers also recommend the analog 
Maselec 2012 HF and peak limiter as an excellent 
de-esser. 


Sintefex Convolution Processor 

Convolution is a mathematical process which 
combines two functions as though onc was run 
through the otlier function. A company called 
Sintefex uses convolution in its model FX8000 
Replicator (pictured below), which some mastering 
engineers report can very effectively sample and 
duplícate the sound qualities of well-known 
compressors, limiters, equalizers and reverberation 
units. Too good to be true? As of this writing, I have 
yet to audition a unit. 


M- ♦ - - 


m 

Tí 


JL * 


* w 



The Sintefex FXSOOO. Does it really replícate? Many people think so. 


The TC Electronic System 6000, TC's flagship 
multichannel product, is extremely easy to use (1 
figured it out without readingthe owner’s manual), 
has impeccable sound and is modularly 
upgradeable. The ICON remóte (pictured at left) can 
control numerous 6000 mainírames at once. Four 
8-channel 96 kHz/48-bit digital engines can 
perform artificial reverberation (amongthe best 
that I havc hcard), comprcssion, expansión, 
limiting, de-essing, mixing, noise reduction, delay, 
special effeets, monitor control and other 


Chapter i 3 15S 











proccssing. It would take an entire chapter to do 
justice to all the possibilities of this unit, for which 
third-party providers such as GML have written 
modules. In addition to digital processing, the 
frame eontains high-quality A/D/A, whose approach 
to jitter reduction I’ve described in Chapter 19. 

Weiss Engineering holds a special place in the 
hearts of oíd-time digital mastering engineers (if 
that's not a contradiction in terms), since they 
invented the first usable high-resolution digital 
processing system, still available as the modular 102 
series. The Gambit line of rackmount processors is 
designed for superb ergonomics and sound quality. 
With a one-knob-per-function philosophy, the 
Gambit series feels just like an analogprocessor, 
w.ththe added versatility of memory storage and 
MIDI remóte control. I analyse the performance of 
the dynamics processor DS1-MK2 and the linear 
pitase EQi-LP (pictured above right) in Chapter 16; 
the latter has become a favorite equalizer. Another 
useful device is the model SFC-2 dual synchronous 
sample rate converter, which I often use to up- and 
down- sample (see Chapter 1). 

Z Systems ZQ-2 is a 6-band stereo digital 
equalizer that sounds very elean and relatively 
undigital (pictured at right). I analyse its near 
textbook-perfect performance in Chapter 16. 

Z Systems Z-link 96+ is an asynchronous 
sample rate converter (ASRC) employing the Analog 
Devices 1896 chip. We can use it to monitor CDs if 
the system's DAC/Master dock is not at 44.1 kHz, so 
as not to disturb the delicate lock between 
processors which are locked at a different rate. 



Below: £pl-LP/7-band lirear-phase equalizer 




Below: Z-Systems ZK-6 6-Channel K-Surround Processor 



159 Other Processing 











































Z-Systems also manufacture digital surround 
processors including the aforementioned ZK-6 K- 
Surround processor (pictured previous page), which 
converts ?-channel material to 6-channel, a 5.1 
compressor and equalizer as well as the ubiquitous 
digital routers described in Chapter %. 


Chapter i 3 


160 


1 The formally correct formulas are: 

Encode: 

M - 0.5 * (L + R) which is 6 dB less than the mono sum. The encoder sums 
and attenuates by 6 dB. 

S • 0.5 • (L - R) which is 6 dB less than the mono difference. The encoder 
falces the difference and attenuates by 6 dB 

Decode: 

L = M ► S 

R = M - S. Be aware that an MS encoder and decoder are idéntica! except for the 
amplitude. and if you use a typical encoder to decode, you wül have :o raise the 
level by 6 dB. 

•a Remember that a downward compressor brings sound down when it goes over 
the threshold. so the actual loudness increase of the compressor is 
accomplished by raisingthe gain makcjp control. In the MS case, very slight 
comprensión. uay 0.5 dB. may be all that Í6 necessary to control that "lo6t" 

vocalist above the band. 

3 If a unit which allows downward compression of M and upward expansión of S 
is not available, I may compresa the M channel in one unit and then upwardly 
expand both channels in another; when properly adjusted, the net result is the 
same as if I had compressed the M channel and expanded the S. 

4 The best way to take advantage of multi'unction boxes is to load an existing 
preset. then bypass ncarly all the unncccaaary and often cxaggcratcd settings 
that manufacturers habitually toss in, and save the preset as a blankslate. 
Apparcntly thcy can't sell a box to ¿to intended market without prcurtu. but the 
preset concept is foreign to the way in which mastenng engineers work. 
cspecially a preset ludicrously named Reggae. Rock and Roll or Smooth Jazz. 
Howcantheygive youasetting withouthavingheard the recordingyouare 
working on? 



we’LL FIX IT 
Hlrae 

MasTerinG. 


— Anón 



PART III: ADVANCED THEORY & PRACTICE 


rr 

MaKinG 

good SOÜnD 

IS LiKe preparinG 

GOOD FOOD. 

if you overcooK 

IT Loses its lasTe. 


— Bob Katz 



CHcLPTer 14 I. Introduction 


HowTo 
Make Better 
Recordings 
in the 2¡ist 
Centuiy 

PABT ONE: 
Monitor 
Calibration 


Calibrated monítors are the critical tools of the 
2i s * centuryaudio engineer. Some engineers think 
(mistakenly) that the need for monitor calibration is 
only for making of 5.1 theatrical mixes. But we’ll all 
make better recordings if we use calibrated stereo or 
surround monitors. Agood-sounding monitor 
system does not come out of the box, it takes work 
and care. But after the work is done, there's nothing 
like the pleasure of hearinggreat-sounding music! 

What is a Calibrated Monitor System? 

A calibrated monitor system is one that is 
adjusted to a known standard gain and frequency 
response. The monitor gain control is repeatable 
and marked in decibels. Repeatable means thatyou 
can return the monitor to a particular gain at any 
time, and calibrated 
means that the 
standard decibel 
markings on the 
monitor scale mean 
the same thing to any 
engineer, whether in 
Calcutta, New York, 
or HongKong.... 

This will help us 
collaborate, to be 
more consistent in 

our work, and to produce mixes that will perform 
together when later assembled at the mastering 
house. As we shall see, the absolute valué of the 
numbers also defines the sound quality of the núx 
that will result. 




Monitor 


Today 



Controls 


Tomorrow 



Tumurruw's monitor control will be 
marked ¡n 1 dB steps, and the 0 dB 
positicn will be calibrated to the 
SMPTE RP 200 standard (to be 
explained). 




II. Getting Rid of Slippery Language 

2ist Century audio will be integrated with 
televisión, home theater, Computer audio, Computer 
games, and music playback. often all coming from a 
central source. During the last century most of us 
worked in uncalibrated listening rooms, adjusting 
our recording levels as we pleased, and just turning 
the monitor knob until it sounded "loud enough.” 

Try this: Put your favorite high - end effeets 
movie into the DVD player, and adjust the loudness 
for a big, enjoyable presentation. Next, put one of 

lastyear’s 
hypercompressed 
pop-music CDs 
into the same 
player. Watch out 
when you hit 
P 1 AY, becausethe 
loudness will be 
overbearingand in danger of damaging components 
and your ears. No wonder the consumers are 
beginning to complain. We can no longer produce 
recordings in isolation without regard to monitor 
calibration, since the same consumer equipment 
that plays DVDs will also play compact dises, videos, 

MP 3 s, DVD-As and SACDs! 

This is why, in the 2i st century, we need to learn 
how to adjust our monitor gain first to a known 
standard, and then make the recording fit to that 
gain. One obstarle is the slippery daily language that 
we use to describe audio. 


Level is often confused 
with Gain/ ” 


So to avoid confusión, the first step is to pick 
words that mean the same thing to everyone. Here is 
a brief glossaiy of the language of levels:’ 

VOLUME. .. usually assoeiated with an audio 
level control, is an imprecise consumer term with 
no fixed defínition. The words more properly used 
in the art are Intensity and Loudness. 

Intensity... (aka SPL, Level, Pressure) a 
measure of the amplitude or energy of the physical 
sound present in the atmosphere. 

Loudness. .. is used speciíically and precisely 
for the perceptual level created inside the 
listener’s brain. Psychoacousticians can create 
subjective experiments that measure loudness, and 
have found that loudness versus intensity is quite 
similar across a population of listeners. However, 
loudness is much more difficult to measure in a 
metering system, in fact, it’s best presented as a 
series of numbers rather than as one overall 
"loudness.” Because of the big difference between 
typical metering Systems and our pereeption, two 
pieces of music that measure the same on an SPL or 
VU meter can have drastically different loudness. 
depending on many factors, including transient and 
frequeney response, and the duration of the sound. 
Exposure time affeets our pereeption; after a five 
minute rest, the music seems much louder, but then 
we get used to it again—good reason to keep a sound 
pressure level meter around to keep us from 
damaging our ears. 

Level... is a measure of intensity, but when 
used alone means absolutely nothing, because it can 

• Thanka to Jim Johnston (in correspondence) for heiping to clarify gome of these 
definí tions. 


Chapter 14 166 



mean almost anything! To avoid confusión, always 
accompany level with another defining term, e.g. 
voltage level , sound pressure level. Level is veiy often 
confused with Gain. Engineers can have a whole 
cunversation about "levels” and not even know what 
they're talking about, unless they clearly distinguish 
gain from level. 

Sound Pkkssurk Level (SPL)... is one of the 
anits of intensity. SPL measurements can be 
repeatable if taken in the same fashion.* 74 dB SPL 
:s the typical sound intensity of spoken word 12; 
nches away, which increases to 94 dB SPL at one 
inch distance. While we often see language like 95 
dB SPL load. , this usage is both inaccurate and ill- 
defined as loud refers to the user’s perception, and 
SPL to the physical intensity. 

Decibels are always expressed as a ratio 

Adecibel (dB) is always a relative quantity; it’s 
always expressed as a ratio, compared to a reference. 
For example, what if eveiy length had to be 
compared to one centimeter? You’d say, "this piece 
of string is ten times longer than one centimeter.” 
It's the same thing with decibels, though sometimes 
the reference is implied. +10 dB means "10 dB more 
than my reference, which I defined as o dB.” 

Decibels are logarithmic ratios, so if we mean "twice 
as large,” we say "6 dB more” [20 * log (2) = 6], 

dBu, dBm, dB SPL, dBFS... are expressions of 
decibels with defined references. I believe the term 
dBu was introduced in the 1960’s by the Neve 
Corporation, and it means decibels compared to a 
voltage reference of 0.775 v0 ^ ts - dBm means decibels 

* SPL measurements muat inelude the wcighting curve used. e.g. A, or G, the 
specd of the meter (slow or fast), and method of spatial averaging (how many 
mikes wcre used and how they wcrc placed). 


compared to a power reference of one milliwatt. dBFS 
means decibels compared tofull scale PCM; that is. o 
dBFS represents the highest digital level we can encode. 


Gain or Amplification... is always a relative 
term expressed in plain decibels, the ratio of the 
amplifier’s output level to the input. It is wrongto 
use an absolute level (e.g. dBu or dBm or dBv) with 
the term gain. It is sufficient to say that an amplifier 
has, for 
example, +27 
dB gain, and a 
nominal output 
level of + 4 dBu 
when fed with a 



+4 dBu 


The meaning of Gain vs. 
Level. An amplifier with 
27 dB gain is fed an 
input signal whose level 
is -23 dBu toyield an 
output 'evel of *4 dBu. 
The decibels ofgain 
shou/d never need a 


given level source, as in this figure. 


suffix. 


Monitor Gain vs. Monitor Level Similarly, 
the sound pressure level fromyour monitor 
loudspeakers is often confused with the monitor 
gain. In fact, the term monitor gain is so slippeiy that 
I have started using a much more solid term that 
eveiyone seems to understand: MONITOR 
POSITION. For example, we say "the monitor 
control is at the o d B position. ” 


Average VS. Peak. As we learned in Chapter 5, 
the instantaneous peak level of a good recording can 
be as much as 20 dB greater than its average (long 
term) level. Generally, we measure average sound 
pressure level with a sound level meter; sometimes 
we look at the peak level. For monitor calibration, 
the SPL meter should use the RMS averaging 
method, as opposed to a simple average (mean); 
simple averaging can produce as much as 2 dB error. 
Unless othenvise specified, when we say average in 


167 How To Make Better 
Recording»: Part One 







this book, we are referringto the RMS-measured 
level as opposed to the peak level. 

Crest Factor is the difference between the 
average level of a musical passage and its instan - 
taneous peak level. For instance. if a fortissimo 
passage measures -2,0 dBFS on the averaging meter 
and the highest momentary peak is -3 dBFS on the 
peak meter, it has a crest factor of 17 dB. 

III. Using A Calibrated Monitor System 
for Level and Quality Judgment 

An experienced engineer can make a good 
mixdown just by listeningand without looking at the 
meter. The key is understanding how to use the 
calibrated monitor control. In simple terms, the 
monitor level control is calibrated so that the o dB 
position produces 83 dB SPL with a pink noise 
calibration signal (to be explained). The recorded 
level of this calibration signal is set to -30 dBFS RMS 
(20 dB belowfull scale digital). What this means is 
that a comfortabiy loud average SPL has been set to 
20 dB below the peak System level. Since the ear 
generally judges loudness by average level, and the 
most extreme crest factor anyone has measured for 
normal music is 20 dB, then our peak level will 
never overload!" Typical mixed material has crest 
factors from 10 to 18 dB, so this mixdown may reach 
peaks from -10 to -2 dBFS, more than adequate 
levels for 24-bit recording, as shown in Chapter 5. 

What this means is that a high monitor position 
will perniit us to produce music with high crest 
factor. Conversely, as you lower the monitor control 
position. you tend to raise the average recorded 

• Assumingthe miz engineer a ears have normal sensitivity to loud sounds. Whilc 
no mix engineer works without glancingat the peak meter, you get my point. 


Chapter 14 1 68 


level to produce the same loudness to the ear. In the 
20 1 ' 1 century, we approached this from the opposite 

W/ten monitor 
gain is 
calibrated so 
average SPL is 83 
dB at-20 dBFS, 
andyou then mix 
by the loudness 
of the monitor, 
then the music 
will never 
overload andyou 
will never have to 
look at a record 
level meter! 



Peak SPL Level 103 dB 

Peak Recorded level OdBFS (full scale) 


Average SPL Level 83 dB 
Average Recorded level -20dBFS 


way; as we raised the average recorded level, we were 
forced to turn down the monitor to keep our cars 
from overloading! 

Monitoring by the numbers 

Judging Loudness. If we become familiar with 
how various known recordings reproduce on our 
calibrated systcm, and the monitor position wc use 
to reproduce those recordings, then we can judge 
the absolute loudness of any master in the making 
just by noting the monitor position, without having 
to compare it with other known recordings. 

Judging Sound Quality. As the average level 
increases and approaches the peak level, more 
compression and peak limiting will be required to 
keep the médium from overloading. As we 
described in Chapter io, some amount of 
compression can enhance a recording, but extreme 
compression is self-defeating, it lowers the crest 
factor and dilutes the clarity, impact, spaciousness, 





and liveliness ofthe presentation. It’s ironic that 
mastering engineers are being asked to do some 
damage to recordings in the ríame of loudness. Of 
course, the point where damage occurs is subjective 
and depends a lot on the musie and the message, but 
we all agree there is such a thing as too much. 

Work to a predetermined and fixed monitor 
gain. In the 2i st century of mastering, we should 
work to a predetermined and fixed monitor gain; if 
the musie becomes too loud, turn down the amount 
of processing or the output of the processors rather 
than turn down the monitor! We should use the 
measured position of the monitor control as a guide 
to the sound quality we are probablygoingto 
produce. In other words, if we find the monitor 
control drifting down too far, our recording is also 
probably deteriorating. o dB position is typically 
necessary to reproduce audiophile classical and 
acoustic jazz recordings that have used no 
compression or limiting. I’vefoundthat -6 dB 
position (corresponding with a crest factor of about 
14, dB) is the lowest monitor gain that still produces 
a high-quality musical product with typical pop 
musie, and most of the pop musie recorded in the 
last century until about 1993 sounds "just right” at 
the -6 dB position. Slowly but surely, as we are 
forced to turn the monitor below -6 dB to keep a 
comfortable loudness, the sound quality is reduced. 
By working hard, I can make masters geared for -7 
or -8 dB monitor position that still sound pretty 
good." But some current hypercompressed pop CDs 
exceed this loudness by as much as 6 more decibels! 


• Some monitor* are marked in "SPL," which designen think is v C ry *ophis 
ticated. However, it’s very misleading. This is a claasic case of confusinggain with 
lcvel. The 83 marker is meaningless after calibration. 


Monitor gain for mixing versus mastering. 
Mixing and mastering should be collaborative 
processes. I recomrnend that you be conservative 
with average levels during mixing, so as not to 
deteriórate the recording, for we cannot restore 
quality that has been lost. When mixing pop musie, 
setyour monitor position from o dB to no lower 
than -6 dB to make a recording that falls in line with 
the vast majority and still has good clean transients; 
it will help you produce a recording with Ufe and 
acceptable dynamic range for home and car 
listening. You will still be able to be Creative with 
compression and other effeets—a fixed monitor gain 
is liberating, not limiting. When such a well-made 
recording arrives 
for mastering, we 
have much more 
freedom; we will 
raise tbe apparent 
loudness if we can 
do so while 


A fixed monitor gain 
is liberating, not limiting. 


preserving or 

enhancing the recording's virtues, but the clarity 
and beauty of the recording will not have been 
ruined prior to arrival at the mastering house. 


Different Size Rooms. Note that room volume 
and number of loudspeakers affect the apparent 
loudness of a system. The more loudspeakers, the 
louder the system for the same monitor control 
position. I determined these recommended 
monitor control positions in a large stereo 
mastering room with loudspeakers 9 feet from the 
listener. In an extra large theatre, as much as 2 dB 
additional gain may be needed, whereas in a small 


169 How To Make Better 
Recordings: Parí One 



remóte truck with loudspeakers a couple of feet 
from the listener, as much as 3 dB less gain may be 
necessary. Set your standards accordingly. 

IV. Setting Up and Calibrating 
the System 

Summary of Essential Tools 

N ow that we know the benefits of having a 
calibrated monitor, let’s see what tools we need to 
construct a good-sounding, calibrated monitor 
system. 

• A great room, whose dimensions, wall 
construction and layout have minimal 
obstructions/reflections between the 
loudspeakers and the listener, with low noise and 
good isolation from the outside world. 

• For surround sound, five matched "satellite" 
loudspeakers and amplifiers with fíat frequency 
response (preferably good down to 60 Hz), high 
headroom, each capable of producing at least io 3 
dB SPLbefore clipping. To repeat the adage from 
Chapter 6, high headroom monitors are necessary 
to make proper sound judgments: if our monitors 
are compressing, we cannot judge how mueh 
compression to use in the recording. 

• One (preferably two) subwoofers, capable of 
extendingthe low frequency response of all the 
sateiiites down to about 35 Hz, and producing at 
least n 3 dB SPL at low frequencies before 
clipping. 

• A low distortion monitor matrix with versatile and 
flexible bass management, capable of repeatable, 
calibrated monitor gains. and of down mixing and 
comparing sources from 7.3 through mono. With 


this, we can confidently produce recordings that 
can be interchanged with the rest. of the world, 
and sound wonderful on systems large and small. 

• A monitor selector to feed the matrix, with both 
digital and analog inputs. 

• Measurement/calibration equipment: 

Preferable: A calibrated i /3 octave real time 
analyzer (RTA) and microphone(s), with múltiple 
memories, selectable response speed, and ability 
to intégrate several microphone locations (spatial 
averaging). 

Altérnate (less accurate): A high quality sound 
level meter with calibrated microphone, 
selectable filters and response speed. 

Test Signáis: If using a sound-level meter, then 
you need RMS-calibrated sources of filtered pink 
noise. If using a i /3 octave RTA, then you can use 
ordinary wide-band RMS-calibrated pink noise. 

• And let’s not forget the most critical ingredient: 
Knowledge. The Services of a trained 
acoustician may be needed on íirst-time setup, 
to perform anechoic and early-reflection analysis 
of the room and loudspeakers, interpret the 
causes of measured frequency response errors, 
their audible significance, and suggest 
acoustically-based cures. 

Placing the Main Loudspeakers 

The ideal reproduction system should have no 
obstacles in the path between all the loudspeakers 
and your ears. This certainly turns most recording 
consoles and outboard racks into serious problems 
and is the reason why my rack gear is in the back 
córner, and my listening couch is placed in front of 
the Computer and DAW. This forces me to go behind 


Chapter 14 1 70 


the ideal listening position when doingheavy 
editing, but all critical listening and remóte control 
of transports and processors can be accomplished 
írom the couch where there is little or no acoustical 
interíerence between loudspeakers and ear. 

The Rope (Clothesline) Procedure 

Tom Holman' describes how two pieces of string 
can be used to set up your monitors at the proper 
distances and angles to conform with the ITU 775’ 
recommendation, illustrated below. 

Here's a step-by-step embellished recipe. All 
speakers are equidistant from the center of an 
imaginary circle, with the center front being o°, 
front left and right speakers at +/- 3 o°, and the 
surround speakers at +/- 110 o (ITU accepts 
surrounds between 100 o & 120 o ). Start with a long 
piece of rope or clothesline (which doesn’t stretch 
so easily) a little longer than 3 times the length of 
the proposed distance to one loudspeaker. Tie one 

end to a mike 
stand located 
at the center of 
the circle (the 
prime 

listener). Run 
the rope to the 
approximate 
proposed 
position of the 
right front 
speaker, and 


• Holman. Tomlinson í 2000I5 .1 Surround Sound: Up and Running. Focal Press, 
t International Telecommunication Union, specification ITU- R BS.775-1 


put a piece of black tape on the string to mark the 
radius of the circle (see 1 ). Then fold the long rope 
at the tape and add two more pieces of tape to mark 
three identical length sections. This radius is our 
"standard length,” and equals 6o° of angle when it 
runs between two points of the circle. 

Spread the marked rope to create an equilateral 
triangle (see 1, 2, 3), and now mark the floor at the 
points for the left front and right front speakers. Cut 
the rope at the first tape to leave a radius that can 
swing from the central mike stand. To find the 
center speaker location. fold a standard length of 
the remaining rope in half and mark its midpoint. 
Use that rope to find the midline between the LF 
and RF speaker and temporarily mark the floor 
there. Then cross the radius rope over this 
centerline and mark the position for the center 
speaker at the end of the radius rope (see 4). 

How to find iio° without a protractor? Use a 
standard length rope reaching from RF (see 5) and 
temporarily mark the spot where it meets the radius 
rope. This is at 3 o°+ 6 o°= 90 o . Now divide a 
standard length rope in thirds (see 6 ), run it from 
the 90 o spot and mark where this i /3 distance meets 
the radius rope. This is 90°+20° = 110 o , for SR. Do a 
mirror image of this procedure to find SL, and 
you’re done! 

Physieally place the subwoofers just in front of, 
and slightly outside the centerlines of the satellites. 
Lateryou may "tweak" the position of the 
subwoofers for the flattest response at the listening 
position and best integration with the satellites. 



The ITU 775 recommendation for 5 channel 
loudspeaker placement. 


ifi How To Make Better 
Recordings: Parí One 







Connecting and calibrating the system levels 

The 5.1 monitor system has six outputs, which 
should be connected to the inputs of the 
correspondingloudspeaker/amplifiers. I'm goingto 
be describing a system using trae stereo 
subwoofers. One way to connect such a system takes 
advantage of a subwoofer with two inputs (which 
most of them have), as illustrated below. You will be 
using sume of the bass managernent built into the 
sub and some built into the monitor matrix. 


Connecting a monitor matrix with 
stereo subwoofers. By using the 
dual inputs ofeach sub, wecan 
still have a mono LFE signai (the 
.1 channel) and stereo bass from 
the front main speakers. 



You will choose the low-pass setting on the 
subwoofer which produces the most seamless 
"splice” to the satellites; ideally as low as 40 Hz, but 
some systems need as high as 80 Hz. This depends 
on the low frequency response of the satellites. 1 
Start with the frequency recommended by the 
manufaeturer and later you can tweak according to 
your room response measurements, as I will 

explain. Set the woofer polarity to 
normal and the initial phase setting to 
o degrees (if the woofer has a 
continuous phase control). The phase 
control onthe subwoofer lines up the 
apparent. distance of the sub with that 
of the satellites. Leave the woofer 
phase at o° if your monitor matrix has 
delay compensation—if the sub is 
closer than the satellites, add time 
delay to the sub based on 1 ms = 1 foot. 
Later this can be fine-tuned. 


preferably using time-delay spectrometry, or the 
real-time analyzer. Ifyour room geometry does not 
permit the surrounds to be the same distance from 
the ear as the front speakers, thenyou can delay the 
appropriate sets of speakers to match. 


Now let’s checkthe integrity of each 
connection. Turn the monitor gain control down 
all the way! Feed a calibrated, uncorrelated", 5- 
channel pink noise source at a level of -20 dBFS 
RMS into all digital inputs of the system, advance 
the monitor gain and the trim adjustment on each 
loudspeaker just a small amount to verify it’s 
operating. Then, solo each output in turn and verify 
it’s getting to the corred speaker. 

SMPTE RP 200 Level Calibraron 

Now we’ll be producing some loud test signáis, 
so we suggest putting on earplugs. Place a calibrated 
measurement microphone pointing directly 
upwards, at ear height at the central listening 
position. Connect this to your i /3 octave RTA. Set 
the RTA to an averaging time between about 3 and 10 
seeonds, and wait at least that long before taking any 
reading. Turn the loudspeaker trim Controls down 
all the way! Set the master monitor level to the o dR 
(reference) position. Now, solo ONLY the Left 
loudspeaker. Slowly turn up the left trim gain until 
the midband energy (particularly in the 1 kHz band) 
reads 68 dB SPL (68.2 dB for perfectionists ). 2 1 f all 
the individual bands were fíat at 68 dB SPL, they 
would sum mathematically to 83 dB SPL, which is 
the SMPTE RP 200 standard. Inspect the RTA for a 
general smooth shape with peaks and dips ideally 
less than plus or minus 3 dB. If any band has a 
significant peak or dip, it’s time to consult an 
acoustician! Ccnerally I prcfcr to solve frequency 
anomalies with acoustic Solutions first rather than 
equalization. Don’t be concerned at this time about 

•Uncorrelated means there is random, or no continuous relationship between 
channels. Correlated means th;re is some relationship. If the same, mono 
source is fed to all channels. then they are 100% correlated. 


Chapter 14 


172 


































the absolute flatness of the high end, which will be 
rolled off. 

Repeat this procedure for each of the 5 main 
loudspeakers, sendingpink noise one channel at a 
time. If 68 dB is not an easy valué to "read” with 
your RTA, then you may, for example, raise the pink 
noise to -18 dBFS RMS, which should result in 70 
dB SPL per 1/8 octave band and (if all bands were 
equal) would sum to 85 dB SPL broadband. 
Remember, it’s far more accurate to use the 
midband level measured with a i /3 octave analyzer 
than a wideband SPL measurement. due to 
variations in microphone off-axis response, low 
frequency room resonances, filter tolerances. and 
so on. The alternative is to use a sound level meter 
withaband-limited 500 Hz to 2, kHz signal 
calibrated to -20 dBFS RMS, to read 83 dB SPL. If 
only full range pink noise is available and an RTA is 
not available, an alternative method (though less 
accurate, with as much as 3 -3 dB possible error) is 
to use a wideband SPL meter set to C weighting, slow 
response, 

Note that the theatrical standard adjusts the 
surrounds each to 3 dB below the fronts, but for 
home music production, all five loudspeakers 
should have the same gain. 

Total Sound Level 

The subwoofers have not yet been calibrated 
and are turned down all the way. Five uncorrelated 
sources should sum approximately 7 dB higher than 
an individual channel. Release the solo bullón and 
verify that all five main speakers are operating, and 
the SPL in the midband rises about 7 dB (+/- 1 dB). 


If not, then one or more of your cables may be wired 
out of polarity, speaker distances or level calibration 
could be off, or a component is defective. 

Phantom Center Check 

Now let’s check the phantom center produced 
by an in-phase mono signal when listening at the 
central position. This confirms the front main 
speakers are in polarity and there are no acoustic 
anomalies. Turn the pink noise off and turn the 
monitor control to about —10. Change the pink noise 
source to mono, that is, the same signal to all 
channels. Solo both left and right front 
loudspeakers. Now remove your earplugs, turn on 
the mono pink noise and verify the phantom center 
appears as a fairly narrow virtual image at the 
physical location of the center loudspeaker. You 
might tweak the angles (toe-in) of the speakers until 
the phantom image is narrow in the critical 
midband. If the image is off-center, recheckthe 
left/right gains and speaker distances. Try tweaking 
onc channel's trim up or down slightly to recenter 
the image, then return to the previous section and 
recheckthe measured left/right gains to verify they 
match acoustically within +/- 0.1 dB in the 1 kHz 
band. Loudspeakers must be well-matched to 
produce an excellent phantom center. 

Now compare the sound of the phantom center 
with that of the center speaker itself, by alternating 
between soloing the center or the two sides. The 
center speaker should sound a little brighter, but 
the position of the pink noise should not change if 
you are sittingin the center and all speakers are 
equidistant from the listener. 


i ?3 How To Make Better 
RecordingS: Part One 


Bass Management 

Integrating a subwoofer or pair of subwoofers to 
extend the response of a stereo system is an art and 
a Science. Extending that ideato 5.1 is serious 
Science, with its own set of compromises. We’re 
going to start by creating and verifying an 
exceptional full-range 2-channel system, then 
extending it to 5.1. Since we are using stereo 
subwoofers, it is logical to set the bass level on a 
per-speaker basis, but the two subs couple with each 
other and the distances between them and from the 
walls affect the total bass response. It’s not an easy 
affair, and you should approach it systematically. 

Objective Subwoofer Measurement: Putyour 
earplugs back on and send uncorrelated pink noise 
at -20 dBFS RMS to the LF system: left satellite and 
sub. Turn up the left subwoofers trim gain until the 
RTA shows the lnw end is in the same ballpark as the 
rest of the frequencies. You may see amplitude 
anomalies near the splice point, indicating some 
parameters are notyet optimized. Then check the 
polarity of the sub; the position that produces the 
most bass is the correct one-, if the result is 
ambiguous, temporarily set the sub’s cutoff 
frequency as high as possible and recheck the 
polarity. The next part is the most time-consuming, 
where art and Science really combine, for the ideal 
splice will happen only when the low-pass 
frequency, high-pass frequency, subwoofer 
amplitude, time delay and phase are just right. Take 
your time, "focusing” each parameter until the 
flattest response is obtained atthe splice point. If 
you must compromise, remember, the ear finds 
peaks more objectionable than dips. Nowtake a 


spatial average of the response over a few listening 
positions around the sweet spot, and continué 
working until you’re satisfied the left sub is 
integrated accordingto the RTA. 

Yon may have to move the subwoofer around to 
produce the flattest extreme low end; the closer the 
sub is to walls or corners, the higher the amplitude 
of the deep low bass. If you move the sub. then you 
will have to readjust its time delay. 

Next, if your room is symmetrical, it makes 
sense to try placing the right subwoofer as a mirror 
image to the left. Though occasionally, this is not a 
good idea if the subs both end up at the peak or nuil 
of a standing wave (expert acousticians apply here). 
Repeat the above process with the right loudspeaker 
system. Now send a mono pink noise source to all 
channels and solo both the left and right system 
(includingthe sub), turning the master monitor 
down until the 1 kHz band reads 68 dB, and see if 
the bass response with both channels operating is 
still withintolerance. Don’t be surprised to see a 
heavier bass response than with the individual 
channel reading. If it rises, even as little as a dB. 
consider spreadingthe subs further apart to reduce 
their coupling, but then again, if they approach the 
walls, the low bass will go up from wall proximity. 
This interaction is at different low frequencies, so 
hopefully you will find a position with the least 
compromise. 

Subjective Assessment, Stereo First 

We have notyet set the bass management for the 
center speaker or the satellites, but now is a good 
time to check out the sound of the full-range stereo 


Chapter 14, 174 


pair with bass management. It would be nice to 
discover a definitive piece of music that. confirms 
your subwoofers are now perfectly integrated with 
the rest of your system. Since a subwoofer is not 
supposed to be a "boom machine” for most music, it 
really should be conspicuous by its absence rather 
than its presence. And that’s the first way to listen. 
Listen to music with the subwoofers on and off. They 
should not fccl "lumpy,” they should simply add a 
sense of weight to the extreme low end. If the 
crossover frequency is 6o Hz or below, thenyou may 
bardly notice a difference except for the solidity of 
the sound. That’s the way it should be! 

Finding the right recording to evalúate bass is 
diffieult because recordings of bass are all over the 
map. It could take days to checkyour subs by using a 
variety of recordings. An excellent way to evalúate a 
full range system is with a recording of a string bass 
whoselevel is veiy naturally-recorded. I have been 
using one of my own stereo recordings as a bass test 
record: my recording of Rebecca Pigeon, "Spanish 
Harlem” onCheskyJDu5 

This song, in the key of G, uses the classic I, IV, 
Vprogression. Here are the frequencies of the 
fundamental notes of this bass melody: 

49 62 73 

65 82 98 

73 93 110 

If the system has proper bass response, the bass 
should sound natural; notes should not stick out too 
far or be recessed. Start with the subs turned off and 
verify the lowest note(s) are a little weak. Then turn 


the subs on and verify they restore the lowest notes 
without adding any anomalies. Verify that the 
addition of the subs does not move the instrument 
forward in the soundstage (an indication the bass 
level is set too high) or become vague in its 
placement (an indication the subwoofers are too far 
apart). It’s that simple. Then, take a break and enjoy 
Rebecca’s performance for its natural acoustic 
rcproduction of voicc, string and pcrcussion 
instruments, and the acoustic depth of a good 
recording hall. If you get this sound quality, then 
you are off to a good start with an excellent 2- 
channel stereo system. 

Bass Management for Center and Surrounds 

Our next job is to smoothly extend the low 
frequency response of the center and surround 
loudspeakers. Once again insert uncorrelated, 
calibrated leve! pink noise, with the master monitor 
to o dR position. Solo the center loudspeaker, and 
set the bass management to feed the low frequencies 
of the center speaker to the subwoofer(s). Adjust the 
highpass frequency of the center loudspeaker to the 
same frequency used for the left and right (if the 
center speaker is the same model as the sides). Then 
tweak the bass management level trim of the center 
(the amount of energy from center redirected to the 
subwoofer) until the total bass response is as fíat as 
possible with the RTA. Determining a eorrect bass 
level from the two surrounds is a bit more 
complicated, since they are electrically summed 
into a single mono bass (unless the bass 
management is sophisticated enough to redirect the 
left surround’s bass to the left sub and vice versa). 
Soloingcach surround in turn, adjust the bass- 


175 How To Make Better 
Recordings: Part One 


management trim from each one for flattest 
response, ihen check the bass response from both 
surrounds at once with both uncorrelated and mono 
pink noise. Favor the response with mono pink 
noise sinee we are assuming that in typical music 
recordingthe bass will be in phase in both 
surrounds. 

LFE Gain Setting 

The LFE, or. i channel is an auxiliaiy channel 
designed to increase the headroom of the bass 
channels. This is because when extra bass is desired 
below about 50 Hz, the ear (which is insensitive to 
bass) could require digital levels as much as 10 dB 
hotter than full scale digital! In a properly-designed 
5.1 system, this headroom is taken care of in the 
design of the subwoofer. If in doubt, check with the 
manufacturen To meet the RP 200 standard, the 
individual RTAbands for the LFE channel only 
should read 10 dB higher than the 1 kHz band. That 
is, 78 dB SPL if the 1 kHz band is at 68 with -30 
dBFS RMS pink noise. Solo the LFE output and 
adjust the level of the LFE channel trim until the 50 
or 63 Hz band reads 78 dB. 

This completes the monitor calihration. Now 
you’re on the same page as the most advanced ai st 
centuiy mastering engineers. To speak the same 
language, tell allyour fellow engineers: "My monitor 
system is calibrated with o dB reference SMPTE RP 
200.” Nowsit back and enjoyyour calibrated 
multichannel reproduction system! 


V. Taking it Beyond: 

Monitor Equalization? 

My philosophy is to avoid monitor equalization 
unless absolutely necessary. 1 believe that we should 
do everything possible to fix room-induced 
problems acoustically, and to relocate subwoofers 
and/or satellites if necessaiy for more linear 
response. Equalization, if performed, should be 
done by a skilled and expenenced acoustician who 
understands the trade offs of electrically equalizing 
the direct response when a room anomaly is the root 
cause. When EQing, remember that the ear 
respondsto the direct and room sound differently 
than an RTA. Finally, consider the tradeoff of 
additional noise and distortion if an equalizer is 
added to a system. 


1 If the satellites are good down to 40 Hz. so much the better. because :he stereo 
imaging will probably be more cohcrcnt with a lowcr crossover frequeney. 
However, when mastering for Dolby Digital, it is important to make a test listen 
with a mono crossover at 100 Hz to be compatible with consumer bass 
management systems. Many authoriticsrecommend a 4^ order (^4 dB per 
octave) lowpass on the woofer and a 2 ni ^ order (12 dB per octave) high pass on 
the satellites. 

2 Holman shows an individual band SPL of 70 dB SPL. but note that this was 
taken with a pink noise signal of 18 dBFS. If the source noise is higher, then 
we must expect a higher output SPL. Measurements will be much more 
repeatable from room to room whenyou measure the 1 kHz band. as described 
in the text. So, determine tlic level to use when nicas ui ing llic 1 kllz band by 

subtracting 14.8 (which all but perfectionists round to 15 dB) from the official 
broadbandSPL. Forexample. if the source of pink noise isat -20 dBFS RMS 
broadband. the broadband SPL would be 83 dBC. and set the monitor gain until 
the 1 kHz band reads 68.2 dB. If the source of pink noise is at -18 dBFS RMS 
broadband. then the broadbadn SPL would be 85 dBC, and the 1 kHz band 70 
dB (70.2). This is partly explained in a footnote to the SMPTE RP200 spccifi- 
cation. 


Chapter 14 176 



COLOR PLATE 



Figure C8-01: SpectraFoo* spectragrom ofthe bass frequencies ofseveral 
measures from a rock piece. Read it ¡ike an orchesrra score, time runs from 
left to right. Red represents the highest levels. Note the bass runs in the 62- 
125 Hz fundamental range are paratteled by secona and third harmonics. 


A VU motor may 
display between -2 
and 0 dB with -20 
dBFS pink noise, 
but K-System 
meter disptays 
0 dB (correct valué) 


K-20/RMS meter 


Cióse view near 0 dB 


4 

3 

2 

1 

0 



RP 200 calibration point 

85 dB (C weichted) SPL wlth Pink nolse 

© -18 dBFS 


83 dB (C welghted) SPL wlth pink nolse 
@ -20 dBFS 


Figure C15-02: A K-20/RM S meter in cióse detaií, with the calibration points. 


THE K-System: 

LOUDNESS AND HEADROOM-BASED 
0 dB always equals 33 dBC SPL wlth pink noise on each K/RMS meter 
Not shown: Dotailod 1 ard 1/3 dB incremente or portione ot ecale bclow -24 dB 



'Daring" Home Theatre 
Wioe-range Music 
20 dB HR over 83 

Figure C15-01: The three K-System meter scales are named K-20, K-14, and K- 
12. í've aíso nicknamed them the papa, mama, and baby meters. The K-20 
meter is intended for wide dynamic range material, e.g., large theatre mixes, 
"daring home theatre" mixes, audiophile music, classical (symphonic) music, 
"audiophile" pop music mixed in 5.1 surround, and so on. The K-14 meter is for 
the vastmajority of moderately-compressed high-fidelity productions 
intended for home listenirg (e.g. some home theatre, pop, folk, and rock 
music). And the K-12 meter is for productions to be dedicated for broadcast 



-16 

-2G 



Figure C15-03: 

A K-14/RMS Mete' as 
¡mplemented in 
Spectrafoo 


-40 


-50 


-80 


\jj Color Plates 



























Figure C16-01: SpectraFoo during a moment of musical action. From left to 
right at top: K-14 Meter, bitscope, and stereo position indicator. Directly 
below the bitscope is a phase/correlation meter. In the middle ofthe screen is 
a Spectragram, quiet section at left part, then the song begins. At top right is 
a stereo position indicator, and at the bottom, the Spectragram, left channel 
in green, right channel in red. 


Color Platos 178 





































Figure C16-02: SpectraFoo during a pause in the music. Only the bottom four 
oits are toggling on the bitscope, and the characteristic curve ofPOW-R dither 
type 3 is revealed on the Spectragram. The last notes of the music "fading to 
Dlack" can be seen at the nght of the timetine on the Spectragraph. 


179 Color Plates 



































Figure C16-03: Comparing 16, 20, 
and 24 bit flat-dithered noise 
floors (red, orange, green traces, 
respe;tively). 


Figure C16-04: POW-R type 3 at 
16-bit(red trace) noise floor, with 
20-bit fíat dither (orange) and 
24-bit fíat dither (green) for 
reference. 


Figure C16-0S: Distortion and 
noise performance of Millennia 
Media NSEQ-2 analog equalizeri,i 
tubenode (red), 20-bit random 
noise floor for reference (blue), 
24-bit noise floor (green), and Z- 
Systems ZQ-2 digital equalizer 
(yellow). 




1 8 o 


Color Plates 

































































































































































































































































































































































Figure C16-06: Distortion and 
noise performance ofanalog 
Millennia Mcd'a NSCQ-2 (red 
trace), versus Digital Z Systems 
set to trúncate at 
20 bits, no dither (blue trace). 


Figure C16-07: Comparing two 
digital compressors, both into 5 
dB ofcompression with a 10 kHz 
signa!. Red trace: Single Precisión, 
non-oversampling. Green: 40-bit 
floatingpoint double-sampling 
and dithered to 24-bit fixed level. 


Figure C16-08: Comparing 
Cranesong HEOD-192 digital 
analog simuíator (blue trace) to 
NS6Q (red). 


Color Plates 


























































































































































































































































































































Figure C16-09: A simple 10 dB 
boostapplied in two different 
typesofprocessors. In red, a 
single-precisión processor, whose 
distortion is the result of 
truncation of all producís below 
the 24th bit. And in blue, the 
output ofa 40-bit ftoating point 
processor which dithers its output 
to 24 bits. 



Figure C16-10: Compares two 
excellent-sounding digital 
dynamics processors, the 
oversampling Weiss DS1-MK2 
(green trace), which uses 40-bit 
floating point calculations, and 
thestandard-sampling h/aves L2 
(red), which uses 48-bit fixed 
point. The switchable safety 
limiter ofthe Heiss, which is not 
oversampled, is shown in orange. 



Figure C19-01: 

Jittertesting: 

16-bitJ-Test signaI (blue trace) 
overlayed with the Noise floor of 
UltraAnalogA/D converter (red 
trace) which together define the 
limits ofresolution ofmyjitter 
test system. 



i s? 


Color Platcs 
















































































































































































































































































































































































































mmmmmm m 


anapsnoc ñame 

TC Inl. Sonic v/proc.ftvo 
I TC Inl, Masteriinfc.avg 
j Cons DAWCoru CD.avg 


[ t-SPDIF Leí... -r ] 



— 

[ l-SPDIF Let... t | 

j 


_ 

1 1-3PDIF Lef... -*■ 1 


— 

1 l-SPDIF Leí... v\ 


— 



Snapshot Ñame Shov In 




TC slaved to Masterlink a... fl-SPDIF Lef... ▼ J 

-ü- 

i 



PDIF Lef... ▼ I 


1 




1 















1 



1 


i 
















_ 




1 , . ■ ,1 , , 

. j L " ^ 









■ i i i 




Figure C19-02: 

Jitter measurements with J-Test 
signal: 

Orange Trace: TC DACjitter on 
internal sync, fed from Sonic 
Solutions. 

Red: TC DAC jitter on internal sync, 
fed from Masterlink. 

Blue: Consumer DAC fed from 
consumer CD Playee 
Green: Consumer DAC fed from 
Sonic Solutions. 


Figure C19-03: 

Jitter measurements, 
demonstrationg how different 
clocking methods may produce 
different sound with the same source 
transport. 

Masterlink transport feeding J-Test 
Signal to TC D/A. 

Blue: TC D/A slaved to Masterlink 
transport via AES/EBU. 

Red: TC D/A on interna! sync. 


Figure C19-04: 

Jitter Measurements: 

J-Test signal feeding Weiss DAC on 
AES/EBU sync 


j 83 


Color Plates 





























































































































































































































































































View from the bridge. Digital Domain’s Mastering studio. Visible in frontofthe hstening couch are: Rollingrack with We/ss íp-1 LPlqualuer, Irleiss DS1-MK2 dynamics processor, and Digital Dómala DD-2 
K-Stereo Processor; One pair of Dorrough meters; Reference JA (satelllte) loudspeakers on sand-filled stands plus Génesis Servo-controlled subwoofers. 


Color Plates 184 














CHaPTer 15 


How To Make 
Better 
Recordings in 
the 2ist 
Centuiy 

Part Two: The K-System, 
an Integrated Approach 
to Metering, 
Monitoring, and 
Leveling Practices 


I. History: The VU Meter 

On May i, 1999, the VU meter celebrated its 
6oth birthday. 60 years—but still widely 
misunderstood and misused. The VU meter has a 
carefully-specified time-dependent response to 
program material that I cali avemging to simplify 
discussion. but really means the particular VU 
meter response. This instrument was intended to 
help program producers create consistent loudness 
amongst program elements, but as it was a poor 
indicator of recording overloads, the meter’s 
designers depended on the 10 dB or greater 
headroom over o VU of the analog media then in use. 

Summary of VU Inconsistencies and Errors 

In general, the meter’s ballistics, scale, and 
frequency response all contribute to an inaccurate 
indicator. The meter approximates momentaiy 
loudness changes in program material, but reports 
that moment-to-moment level differences are 
greater than the ear actually perceives. 

Ballistics: The meter's ballistics were designed 
to Took good” with spoken word. Its 3 oo ms 
integration time does give it a syllabic response, but 
does not make it accurate. One time constant cannot 
sum up the complex múltiple time constants that 
make up the loudness perception of the human 
listener. Skilled users soon learned that an 
occasional short "burst” from o to *3 VU would 
probably not cause distortion, and usually was 
meaningless with regard to loudness change. 

Scale: In 1989. logarithmic amplifiers were 
large and cumbersome to construct, and it was 




VJ meter operators are often f ooled 
irto treating the top and bottom 
halves of the scale with equal 
weight, but the top halfhas only 6 
dS of the total dynamic canje. 


desirable to use a 
simple passive Circuit. 
The result is a meter 
where every decibel of 
change is not given 
equal merit. The top 
50% of the physical 
scale is devoted to only 
the top 6 dB of dynamic 
range, and, as 
illustrated. the meter’s 
useable dynamic range is only about i 3 dB. Not 
realizingthis fundamental fact, inexperienced and 
experienced operators alike tend to push audio 
levels and/or compress them to stay within this 
visible range. The extreme needle movements make 
it difficult to distinguish compressed from 
uncompressed material. Soft material may hardly 
move the meter, but be well within the acceptable 
limits for the médium and the intended listening 
environment. 5 


Frequency response: The meter’s relatively fíat, 
frequeney response results in meter deflections that 
are far greater than the perceived loudness change, 
since the ear’s response is non - linear with respect 
to frequency. Frequency distribution and average 
level both affect loudness. For instance, when 
mastering reggae music, which has a very heavy bass 
content, the VU meter may bounce several dB in 
response to the bass rhythm, but perceived loudness 
change is probably less than a dB. 

Lackul'adhereiice to standards: Incurrent 
use, there are large numbers of improperly- 


Chapter 1 5 '«6 


terminated mechanical VU meters and inexpen- 
sively-constructed indicators which are labeled 
"VU." I’ve seen fights break out amongst program 
producers readingdifferent "VU” instruments. A 
trae VU meter is a rather expensive device and it 
can t be called VU unless it meets the standard. 

Over the past 6oyears, psychoacousticians have 
leamed how to measure loudness much better than 
a VU. Despite all these faets, the VU meter is a very 
primitive loudness meter. In addition, digital 
technology lets us correct the non-linear scale, its 
dynamic range, ballistics, and frequency response. 

II. The Magic of 83 with Film Mixes 

Unlike music CDs, films are consistent from 
one to another, because the monitoring gain has 
been standardized, as we learned in Chapter 14. In 
1983, as workshops chairman of the AES 
Convention, I invited Tomlinson Holman of 
Lucasfilm to demónstrate the sound techniques 
used in creatingthe Star Wars films. Dolby Systems 
engineers labored for two days to calihrate the 
reproduction system in New York’s flagship Ziegfeld 
theatre. Over 1000 convention attendees filled the 
theatre center section. At the end of the 
demonstration, Tom asked for a show of hands. 
’TIow many of you thought the sound was too loud?” 
About four hands were raised. "How many thought it 
was too soft?” No hands. "How many thought it was 
just right?” At least 996 audio engineers raised their 
hands. 

The choice of 83 dB SPL has sloodthe test of 
time, as it permits wide dynamic range recordings 



with little or no perceived system noise when 
recordingto magnetic film or high-resolution 
digital. 83 dB also lands on the most effective point 
on the Fletcher-Munson equal loudness curve, 
which is where the ear’s frequency response is most 
linear. When digital technology reached the large 
theatre, the SMPTE attaehed the SPL calibration to a 
point 20 dB below full scale digital instead of o VU.* 
When we converted to digital technology, the VU 
meter was rapidly replaced by the peak program 
meter, which didn't faze the film world, but 
definitely caused the music industry to suffer, as we 
shall see. 

III. United We Stand At Home 

As we saw in Chapter 14, with the integration of 
media into a single system, it is in the direct interest 
of music producers to think holistically and unite 
with video and film producers for a more consistent 
consumer audio presentation. New program 
producers with little experience in audio production 
are coming into the audio field from the Computer, 
software and Computer games arena. We are 
entering an era where the learning curve is high, 
recording engineer’s experience is low, and the 
monitors they use to make program judgments are 
less than ideal. It is our responsibility to edúcate 
new engineers on how to make loudness and quality 
judgments. Aplethora of peak-only meters on every 
Computer, DAT machine and digital consolé do not 
provide information on program loudness. 

Engineers must learn that the solé purpose of the 
peak meter is to protect the médium and that 
something more like average level affects the 
program’s loudness. 


Current-day leveling problems: The Loudness Race 

The loudness race is not new; in the days of 
vinyl, mastering engineers competed to produce the 
loudest LP. But what is new is the fantastic 
magnitude of the problem: due to the nature of the 
digital médium, there is no longer the physical limit 
which was previously imposed by analog mechano- 
electrical systems and magnetic analog recording. 
Without that limit it is possible to produce CDs 
whose average level is almost the same as the peak 
level, an incredible 20 dB above the oíd average 
levels! Powerful digital compressors and limiters 
enable mastering engineers to produce a distorted 
signal for which there is no preoedent in over 100 
years of recording. 1 So, as we converted to digital 
technology, the result became chaos, yielding 
unprecedented differences in loudness between 
recordings. 

On the next page is a waveform taken from a 
digital audio workstation, showingthree different 
styles of music recording. The time scale is about 10 
minutes total, and the vertical scale is linear, +/- 1 at 
full digital level, o.q amplitude is 6 dB below full 
scale. The "density” of the waveform gives a rough 
approximation of the musie’s dynamic range and 
crest factor. On the left side is a piece of heavily 
compressed pseudo "elevator music” I constructed 
for a demonstration at the íoythAES Convention. In 
the middle is a four-minute songfrom a popular 
compact disc produced in 1999. On the right is a 
four-minute popular rock and roll recording made 
in 1990 that’s quite dynamic-sounding for rock and 
roll of that period. The perceived loudness 
difference between the 1990 and 1999 CDs is 


• See Appendix 9 for discusión on how "85" bccanie " 82 ". 


1S7 How To Make Better 

Recordings: Part Two 






I EI ev-a tor 
(Ricky Martin 
I Mellencamp 


On theleft, moderately 
compressed "Elevator Music.” In 
the Miidle, a "top of the pops" 
selecton from theyear 1999. At 
nght, a rock and roll record from 
1990. /ertical and horizontal 
scales are idéntica!. 


greater than 6 dB, 
though both peak to ful! 
scale! Auditioningthe 
1999 CD, one mastering 
engineer remarked, 
"this CD is a light 
switch?! The music 
starts, all the meter 
lights come on, and it stays there the whole time.” 
To say nothing about the distortion. Are we really in 
the business of making square waves? Why has the 
average sound quality of popular music CDs gone 
downhill since the introduction of the digital 
médium, and what can we do to fix the problem? 2 


c 

4 

00:00:12:05 54 1 

00:02:13:12.08' 

00:02 


4 

00:02:22:28.22 

00:06:29:26.05 

00:04 


4 

00:06:30:28.66 

00: 10:28:17.19 

00:03 


The psychoacoustic problem is that when two 
identical programs are presented at slightly 
differing loudness, the louderof the two often 
appears "better,” but only in short temí listening. 
This explains why CD loudness levels have been 
creeping up until sound quality is so bad that 
everyone can perceive it (illustrated below). And 
why there is a remarkable (and unnacceptable) 15 
dB difference in average level among pop CDs! 
Remember that the loudness "race” has always been 
an artificial one, since the consumer adjusts their 


The Insone Increose in "Hottest" Pop CD Levels 
1980 1990 1995 2000 



RED=AverogB Level WHITEsHeodroom for peoks 
The height cí the red bar reflects perceived 
loudness and potentiol loss of quality and clarity 


Is this what 
will happen to 
the next 
generation 
carrier? 

(e.g. OVO-A, 
SACO). It will, 
if we dor't take 
steps now to 
stop it. 


volume control according to each record anyway. 
This uncontrolled situation is an obstacle to 
creating quality program material in the 2ist 
century. What good is a 24-bit/q6 kHz digital audio 
system if llie programa we ereale only have 1 bit 
dynamic range? 

There are, of course, specific places where 
heavy compression is needed: background 
listening, parties, bar and jukebox playback, car 
stereos, headphone-wearing joggers, the 
loudspeakers at the record stores, headphone 
auditioning at the record store kiosk , and so on. In 
each of these cases, it should he possible to either 
produce a custom- compressed CD just for the 
purpose, or to install a compressor in the jukebox, 
CD changer, or reproduction system. Certainly this 
is a lot less damaging than compromising recorded 
music for all listeners. What we wish for is a low- 
fidelity replacement for the analog cassette. 
Ironically, the compact disc has become its own 
worst enemy, for it cannot be different things to 
different needs. * I dream of a perfect world where 
all the MP 3 singles are heavily compressed and all 
the CD albums undamaged. 

IV. The relationship between 
SPL and O VU 

Around 1994 I installed a pair of Dorrough 
meters, in order to view the average and peak level 
simultaneously on the same scale. These meters use 
a scale with o "average" (a quasi-VU characteristic 
IT 1 cali AVG) placed at 14 dB below full digital scale, 
and full scale marked as +14 dB. Music mastering 
engineers often use this scale, since a typical stereo 


Cliapter 15 


188 



























í/ss" 3 o IPS analog tape has approximately 14 dB 
headroom above o VU. 

The next step is to examine a simple relationship 
between the o AVG level and the sound pressure 
level. For many pop productions, our calibrated 
monitor control sits at -6 dB (whichyields 77 dB SPL 
with -20 dBFS RMS pink noise). 

Since onthe meter,-20 dBFS reads -6 AVG, 
then 6 dB higher, or o AVG must be 83 dB SPL. This 
means we're really running average SPLs similar to 
the theatre standard (our sound quality is not as 


77 dB SPL «3 dB SPL Full Scale 


+14 over O “VU" 
O dBFS Peak 



The Dorrough Meter. With the monitor control's position set to 6 dB below the 
film referencc, 77 dB SPL lands at 20 dBFS, or 6 AVG on the meter, hot by 
coincidence, this corresponds with 83 dB SPLat the meter's O AVG point, 
revealing the obvious correlation between a mostering engineer’s meter ZSRO 
and 83 dB SPL. 

clear as that of the theatre, and our loudness is 
probably slightly lower because some high- 
írequency transients have been clipped by 6 dB of 
compression). Our "pop studio” headroom is only 
14 dB above 83 instead of 20. The absolute loudness* 
of our pop presentation is norninally 6 dB louder 
than a film inthe theatre, necessitating turning 
down the monitor gain by 6 dB. 


• ABSOLUTE LOUDNESS: Aterm I use when comparingthe apparent loudness 
of different sources without moví ng the monitor control. 


Running a sound pressure level meter during 
the masteringsessionconfirms that the ear likes o 
AVG to end up circa 83 dB (-86 dB with both 
loudspeakers operating) on forte passages, even in 
this compressed strucmre. If the monitor gain is 
further reduced by 2 dB the mastering engineer 
judges the loudness to be lower, and he raises 
average recorded level—and the AVG meter goes up 
by 2 dB. It’s a linear relationship. 1 This leads us to 
the logical conclusión that we can produce 
programs with different amounts of dynamic 
range by designing a loudness meter with a 
sliding scale, where the moveable o point is tied 
to the same monitor SPL. Regardiess of the scale, 
production personnel would tend to place music 
near the o point on forte passages. 

V. The K-System Proposal 

This leads us to my K-System proposal, a 
metering and monitoring standard that integrates 
the best concepts of the past with current psychoa- 
coustic knowledge in order to avoid the chaos of the 
last 2oyears. It also develops a common language of 
levels, so that engineers can properly communicate. 

Inthe 2Qth Centuiy we concentrated on the 
médium. In the 2ist Centuiy, we should concéntrate 
on the message. We should avoid meters which have 
o dB at the top—this discourages operators from 
understanding where the message really is. Instead, 
we move to a meteringsystem where o dR i.s a 
reference loudness, which also determines the 
monitor gain. In use, programs which exceed o dB 

' Linear until things gct so squashed that the increasingly compressed sound is 
not equally louder for the same measured increasc in the fíat meter’s average level. 


HowTo Make Better 
RecordingS: Part Two 






\ give some 
1 indication of the 
1 amount of 
| Processing 
I (compression) 

/ which must have 
been used. There 
are thrcc different K System meter scalcs, with o dB 
at either 20,14,or12 dB below full scale, for typical 
headroom and SNR requirements. The dual- 
characteristic meter has a bar representing the 
average level and a moving line or dot above the bar 


{ 'The K-system is notjust a meter 
scale, it is an integrated system 
tied to monitoring gain. ” 


[K-System Meter. 

For a color image, 
please see the 
Color Plates 
section, Figure 
C15-01] 


THE K-System: 

LOUDNESS ANO HEADROOM-BASED 


0 dB always ecuals 83 dBC SPL with pinV. noise on each YJ RMS me'.er 
Not shown: Detailed 1 and 1/2 dB increments or portions of scale below -24 dB 



'Daring" Home Theatre 
Wide-range Music 
20 dB HR over 83 


The three K-System meter sedes are named K-20, K-14, and K-12. / Ve also nicknamed them the 
papa, mama, and baby meters. The K-20 meter is intended for wide dynamic range material, 
e.g., large theatre mixes, "daring home theatre " mixes, audiophile music, classical (symphonic) 
music, “audiophile" pop music mixed in 5.1 surround, and so on. The K-14 meter is for the vast 
majority of moderately-compressed high-fidelity productions intended for home listening (e.g. 
some home theatre, pop, folk, and rock music). And the K-12 meter is for productions to be 
dedicated for broadeast. 


Chapter 15 190 


representing the most recent highest instantaneous 
(1 sample) peak level. 

Several accepted methods of measuring 
loudness exist, of vaiying accuracy (e.g., ISO g 32 , 
LEQ, Fletcher-Harvey-Munson, Zwicker and 
others, some unpublished). The extendable K- 
system accepts all these and future methods, plus 
providing a "fíat" versión with RMS characteristic 
that resembles the elassie VU meter. 


Note that full scale digital peak level is always at 
thetop of each K-System meter, it does not change. 
Only the average level calibration slides, the 83 dB 
SPL point slides relative to the máximum peak level. 
Usingthe term K-(N) defines simultaneously the 
meter’s o dB point and the monitoring gain, making 
this the first integrated meteringand monitoring 
system." 


Simplified Explanation 

Many masteringengineers have recognized that 
the peak meter is inadequate for judging loudness, 
so they use a traditional analog VU meter. But 
because of the wide range of average levels on 
curren! pop CDs, they use a variable VU meter 
attenuator to prevent the VU from pinning or 
reading out of range. Think of the K-System as a 
coordinated attenuator for botli the averaging 
meter and the monitor gain. The principie is that 
as we attenuate the average meter while goingfrom 
K-20 to K-14 we must also tura downthe monitor 
gain, to arrive at the same loudness to the ear. If the 
monitor gain were not attenuated, then K-14 
material reaching o dB average on its scale would 


• I invented these K-(N) terms because it was getting very awkward to describe the 
crest factor or loudness of music in a simple but useful way. 







sound 6 dB louder than K-20 material going to o dB 
average on its scale. 

Peak and Average calibrated to same decibel valué 
with sine wave 

The peak and average scales are calibrated as 
per AES-17, so that peak and average sections are 
referenced to the same decibel valué with a sine 
wave signal. In other words, +20 dB RMS with sine 
wave reads the same as + 20 dB peak. and this parity 
will be true only with a sine wave. Analog voltage 
level is not specified in the K-system, only SPL and 
digital valúes. There is no conflict with -18 dBFS 
analog reference points commonly used in Europe. 



For medium-size control 
rooms, typical monitor 
gain (control positicn) 
will be 0 dB with the K-20 
meter, -6 dB with the K- 
14 meter, and -8 dB with 
the K-12 Meter. 0 dB 
monitor gain is the 
calibration point that 
corresponds with the 
RP200 standard (see 
Chapter 14). 


VI. Production Tech ñiques with 
the K-System 

To use the System, first choose one of the three 
meters based on the intended application. Wide 
dynamic range material probably requires K-20 and 
médium range material K-14. Then, calibrate the 
monitor gain to ñP200 as in Chapter 14. o dB always 
represents the same calibrated (83 dBC) SPL on all 
three scales, unifying production practices 


worldwide. If consolé and workstation designers 
standardize on the K-System it will make it easier 
for engineers to move programs from studio to 
studio. Sound quality will improve by uniting the 
steps of pre-production (recording and mixing), 
post-production (mastering) and metadata 
(authoring) with a common "level” language. By 
anchoring operations to a consistent monitor 
reference, operators will produce more consistent 
output, and eveiyone will recogníze what the meter 
means. 

If making an audiophile recording, then use 
K— 20; if making "typical” pop or rock music, or 
audio for video, then probably choose K-14. It will 
be hard for current pop mastering engineers to 
convert to K-14 or even K-12 in some cases, because 
much of today’s damaged pop music is significantly 
hotter than even K-12—but we must find a way to 
back off from the loudness race. Ideally, K-12 
should be reserved strictly for audio to be dedicated 
to broadcast; broadcast recording engineers may 
choose K-14 if they feel it fits their prograrn 
material. Pop engineers are encouraged to use K-20 
when the music has useful dynamic range. The two 
prime scales, K-20 and K-14 wiU create a cluster 
near two different monitor gain positions. People 
who listen to both classical and popular music are 
already used to moving their monitor gains about 6 
dB (sometimes 8 to 12 dB with the hottest pop CDs). 
It will become a joy to find that only two monitor 
positions satisfy most production chores. With care. 
producers can reduce prograrn differences even 
further by ignoring the meter for the most part, and 
working solely with the calibrated monitor. 


191 


How To Make Better 
Recordings: Part Two 



Usingthe Meter’8 Red (Fortissimo) Zone. This 
88-90 dB+ región is used in films for explosions 
and special effects. In music recording, naturally- 
recorded (uncompressed) large symphonic 
ensembles and big bands rcach 1 3 to 14 dB on thc 
average scale on the loudest (fortissimo) passages. 
Rock and electric pop music take advantage of this 
loudzone , since climaxes, loud chorases and 
occasional peak moments sound incorrect if they 
only reach o dB (forte) on any K-system meter. Use 
the fortissimo range occasionally , otherwise it is 
musically incorrect (and ear-damaging). If 
engineers find themselves using the red zone all the 
time, then either the monitor gain is not properly 
calibrated, the music is extremely unusual (e.g. 
heavy metal), or the engineer needs more monitor 
gain to correlate with his or her personal 
sensitivities. Otherwise the recording will end up 
overcompressed, with squashed transients, and its 
loudness quotient out of line with K-System 
guidelines. 

Equal Loudness Contours. Mastering 
engineers are more inclined to work with a constant 
monitor gain. But music mixing engineers often 
work at a higher SPL, and vary their monitor gain to 
check the mix at different SPLs. I recommend that 
mix engineers calíbrate your monitor attenuators so 
you can easily return to the recommended standard 
for the majority of the mix. Otherwise it is likcly the 
mix will not transíate to other venues, since the 
equal-loudness contours indícate a program will be 
bass-shy when reproduced at a lower (normal) level. 

Traoking/Mixing/Ma8tering. The K-System 
will probably not be needed for multitracking—a 


simple peak meter is sufficient. For highest sound 
quality, use K-20 while mixing and save K-iq for the 
calibrated mastering suite. If mixing to analogtape, 
K-14 may prove more appropriate. K-ao doesn’t 
prevent thc mix cnginccr from using compressors 
during mixing, but I hope that engineers will return 
to using compression as an esthetic device instead 
of tryingto win the loudness race. 

Using K-20 during mix encourages a clean- 
sounding mix that’s advantageous to the mastering 
engineer. At that point, the producer and mastering 
engineer should discuss whether the program 
should be converted to K-14. or remain at K-20. The 
K-System can becotne the lingua franca of interchange 
within the industry, avoidingthe current problem where 
different mix engineers work on parts of an álbum to 
different standards of loudness and compression. 

When the K-System is not available. Current- 
day analog mixing consoles equipped with VUs are 
far less of a problem than digital models with only 
peak meters. Calíbrate the mixdown A/D gain to -10 
dBFS at o VU (sine wave), and mix normally with the 
analog consolé and VUs. However, mixing consoles 
should be retrofitted with calibrated monitor 
attenuators so the mix engineer can repeatably 
return to the same monitor setting. 

Adapting large theatre material to home use 
may require a change of monitor gain and meter 
scale. Producers may choose to compress the 
original 6-channel master, or better, remix the 
entire program from the multitrack stems 
(submixes). With care, most of the virtues and 
impact of the original produetion can be maintained 


Chaptcr 15 192 


in the home. Even audiophiles will find a well- 
mastered K-14 program to be enjoyable and 
dynamic. We should try lo fit this reduced-range 
mix on the DVD with the wide-range theatre mix. 

Midtichannel to Stereo Reductions. The 
current legacy of loud pop CDs creates a dilemma 
because DVD players can also play CDs. Producers 
should try to create the 5.1 mix of a project at K-20. 

If possible, the stereo versión should also be mixed 
and inastered al K-20. Wliile a K-^o CD will not be 
as loud (absolute loudness) as many current pop 
CDs, it will probably be more dynamic and 
enjoyable, and iinportantly there will not be a 
serious loudness jump compared to K-go DVDs in 
the same player. If the producer insists on a hotter 
CD, try to make it no louder than K-14, so there will 
be no more than a 6 dB loudness difference between 
the DVD and the audio CD. Tell the producer that the 
vast majority of great-sounding pop CDs have been 
made at K-14 and the CD will be consistent with the 
lot, even if it isn’t as hot as the current hypercom- 
pressed fashion. The hypercompressed CD is the 
one that's out of line, not the K-14. 

Full scale peaks and SNR. As we’ve discussed 
(Chapters 5 and 14) it is not necessary to peak a 
24-bit recordingto full scale. Another good reason 
is that a program’s signal-to-noise ratio is 
determined by its actual loudness, the position of 
the listener’s monitor level control determines the 
perceived loudness of the System noise. If two 
similar music programs reach o on the K-system’s 
average meter, even if one peaks to full scale and the 
other does not, both programs will have similar 


perceived SNR. Use the averagingmeter and your 
ears asyou normally would, and with K-go, even if 
the peaks don’t hit the top, the mixdown is 
considered normal and ideal for mastering. 

Multipurpose Control Rooms. With the 
K-System, multipurpose production facilities will 
be ahle to work with wide-dynamic range 
productions (music, videos/films) one day, and mix 
pop music the next. A simultaneous meter scale and 
monitor gain change accomplishes the job. 
Operators should be trained to change the monitor 
gain accordingto the K-standard. 

In Color Píate Figure C 15-02 is a picture of the K- 
20/RMS meter in cióse detail, with the calibration 
points. Individuáis who wish to use a different 
monitor gain should log it on the tape (file) box, and 
try to use this point consistently. Even with slight 
deviations fromthe recommended practice, the 
music world will be far more consistent than the 
current chaos. Everyone should know the monitor 
gain they like to use. 

In Color Píate Figure C 15-03 is a picture of an 
actual K-14/RMS Meter in operation at the Digital 
Domain studio, as implemented by Metric Halo labs 
in the program SpectraFoo™ for the Macintosh 
Computer. SpectraFoo versions 3fi7 and above 
inelude full K-System support and a calibrated RMS 
pink noise generator. On the PC, Pinguin has 
implemented meters that conform exactly with the 
K-System. The Dorrough and DK meters nearly 
meet K-System guidelines but be sure to use an 
external RMS meter for calibration since they use a 
different type of averaging. In practice with program 


iy 3 How To Make Better 
Recordings: 1 ‘art Two 


material, the difference between RMS and other 
meter averaging methods is imperceptible. I hope 
soon a company will implement the K-System with a 
truer loudness cbaracteristic. 

Audio Cassette Duplication. Cassette 
duplication has been practiced more as an art than a 
Science, bul it should be possible to do better. Tile 
K-System may finally put us all on the same page, 
ironically just in time for the cassette's 
ohsolescence. It’s been difficult for mastering 
engineers to communicate with cassette 
duplicators, finding a reference level we all can 
understand. The cassette tape most commonly used 
cannot tolérate average levels greater than +3 over 
185 nW/m (especially at low frequencies) and high 
frequency peaks greater than about +5-6 are bound 
to be distorted and/or attenuated. Displaying crest 
factor makes it easy to identify potential problems; 
also an engineer can apply cassette high-frequency 
preemphasis to the meter. An engineer can make a 
good cassette master by using a "predistortion” 
filter with gentle high-frequency compression and 
equalization. Use K-14 or K-20, and put test tone at 
the K-System reference o on the digital master. 
Peaks must not reach full scale or the cassette will 
distort. Apparent loudness will be less than the K- 
standard, but this is a special case. 

Classical music. The dilemma is that string 
quartets and Renaissance music, among other 
forms, have low crest factors as well as low natural 
loudness. Consequently, the string quartet will 
sound (unnaturally) much louder than the 
symphony if both are peaked to full scale digital. For 


example, dedicated classical producers have avoided 
mastering their harpsichord recordings to full scale, 
or they sound unnaturally loud at standard monitor 
gains. It's hard to get out of the habit of peaking our 
recordings to the highest permissible level. I 
strongly feel it is much better for the consumer to 
have a consistent monitor gain than to peak every 
recordingto full scale digital. Attentive listeners 
prefer auditioning at or near the natural sound 
pressure of the original classical ensemble. 4 

Classical engineers should mix by the calibrated 
monitor, and use the average section of the K-meter 
only as a guide. It's best to fix the monitor at the o 
dB position and always use the K-20 meter even if 
the peak level does not reach full scale. There will be 
less monitoring chaos and more satisfied listeners. 
However, some classical producers are concerned 
about loss of resolution in the 16-bit médium and 
may wish to peak all recordings to full scale. I hope 
you will all reconsider this thought when 24-bit 
media reach the consumer. Until then chaos will 
remain in the classical field, and perhaps only 
metadata will sort out the classical music situation at 
the listener's end. 

Narrow Dynamic Range Pop Music. We can 
avoid a new loudness race and consequent quality 
reduction if we unite behind the K-System befo re 
we start fresh with high-resolution audio media 
such as DVD-A and SACD. Similar to the above 
classical music example, pop music with a crest 
factor much less than 14 dB should not be mastered 
to peak to full scale, as it will sound too loud. 


Chapter 15 194 


Recommended: 

i ) Author with metadata to benefit 
consumers using equipment that supports 
metadata 

?) If possible, master such dises at K-14 or 
even K-ao. 

3 ) Legacy music, remasters from often 
overcompressed CD material should be 
reexamined for its loudness character. 

If possible, reduce the gain during 
remastering so the average level falls within 
K-14 guidelines. Even better, remaster the 
music from unprocessed mixes to undo 
some of the unnecessaiy damage incurred 
by the loudness race. Some mastering 
engineers already have made archives 
without severe processing. 

Multichannel 

There’s good news for audio quaUty: 5.1 
surround sound. Current 5.1 mixes of popular music 
sound open, clear, beautiful, yet also impacting. Six 
speakers provide much more headroom and sound 
output than two, so ifyou workby the monitor gain. 
the channel meter levels will tend to run a bit lower. 
What became clear while watching the K-20 meter is 
that the best engineers are using the peak capability 
of the 5.1 system strictly for headroom, the way it 
should be. System hiss is not evident at o dB 
monitor position with long-wordlength recording, 
good D/A converters, modern preamps and power 
amplifiers. 

Lobeling The Boxes 

Since the K-System is extendable to future 
methods of measuring loudness, program producers 


should mark their tape boxes or digital files with an 
indication which K-meter and monitor calibration 
was used. For example, K-14/RMS, or K-zo/Zwicker. I 
hope that these labels will someday become as 
common as listings of nanowebers per meter and 
test tones for analog tapes. 

Vil. Metadata and the K-System 

Metadata is data within data, that is, control 
data embedded in the digital audio stream. Dolby 
Digital, MPEGa, AAC, and hopefully MLP will take 
advantage of metadata control words (defined 
below); note that standard PCM, as used in the 
Compact Disc, has no provisión for metadata, and to 
the best of my knowledge, neither does SACD. Pre- 
production with the K-System will speed up the 
authoring of metadata for broadeast and digital 
media. Music producers must become familiar with 
how metadata affeets the listening experience. 

Metadata Control Words 

Dialnorm, dialogue nonnalization, also known 
as volume normalization, is used in digital televisión 
and radio as "ecuménica! gain-riding.” Program 
level is controlled at the decoder, producing a 
consistent average loudness from program to 
program; with the amount of attenuation 
individually calculated for each program and carried 
as a command on the metadata word. At each 
program change, the receiver decodes the dialnorm 
control word and attenuates the level by the 
calculated amount, resulting in the "table radio in 
the kitchen” effect. In a somewhat unnatural 
manner, like the radio, average levels of sports 
broadeasts, rock and roll, newscasts, commercials. 


195 How To Makc Better 
Recordings: Part Two 


quiet dramas, soap operas, and classical music all 
end up at the loudness of dialogue, a ratherstrange 
effect, but no different loudness -wise than standard 
radio today. Tbe listener can tnrn his reeeiver up 
and experience the intended loudness—without the 
noise modulation and squashing of current analog 
broadcast techniques. Or. he can choose to turn off 
the dialnorm on some receivers, and hear a 
loudness variance from program to program. 

Dialnorm is a simple gain change, without 
compression, and maintains the crest factor and 
dynamic range of the studio mix. For example, in 
variety shows, the music group will sound pleasingly 
louder than the presenter. Sports crowds will be 
excitingly loud, and the announcer will no longer 
"step on” the effects, because the bus compressor 
will be banished from the broadcast chain. 

Mixlev. Dialnorm does not reproduce the 
dynamic range of real life from program to program. 
This is where the optional control word mixlev (mix 
level) enters the picture. The dialnorm control word 
is designed for casual listeners, and mixlev for 
audiophiles or producers. Very simply, mixlev sets 
the listener’s monitor gain to reproduce the SPL used 
by the original music producer. If the K—system was 
used to produce the program, then K-14 material 
will require a 6 dB reduction in monitor gain 
compared to K-?o, and so on. Attentive listeners 
using mixlev will no longer have to adjust monitor 
gains for different music types. 

The use of dialnorm. and mixlev can be extended 
to other encoded media, such as DVD-A. Proper 
application of metadata and the K-System for pre- 


production practice—will result in a far more 
enjovable and musical experience than we had at the 
end of the century of audio. 

In Summary 

The designers of the compact disc never 
anticipated that an all-digital recording system 
would yield an alarming loudness race and seriously 
distorted music, worse than ever took place in the 
days of the LP. I propose a new system with a 
common language, integrating monitoring and 
loudness metering to produce more consistent 
masters, and move audio practice into the 2i st 
century. Teach everyone how—the Rosetta stone is in 
this chapter. 


1 Ironically. current day compression practices (especially in pop music) are far 
more aggressive than necesaary. even stronger than our approach to the noisicr 
analog médium of the past! CDs can and should be producei to the same audio 
quality standard as the DVD, bul I’d be satisfied with the levcling practices that 
made good LPs. 

2 I se< an interesting analog)' of the loudness race and the migration of pJtch 
slnce the 16^ century. Music seems 10 be racingto be just a líttle more sharp 
than the previous general ion. so that an A played on an instrument tuncd to 
previous standards is now the C or G # of today, so it ultimately turns into a 
problem of transposition. Unfortunately. audio systcms cannot accomodate an 
infinite loudness nse. We must voluntarily "transpose” back, orgo deaf. 

3 This is what the DVD and DVD-A proclaim to be. a single audio médium for all 
needs. bccause the table radio or the car can contain built - m compression. 
folie wing the metadata coefficients laid down by the program producer. Let's 
meet again in 20 years and see if that promise has been mct. 

4 The late Cabe Wiener produced classical recordings noting in the liner notes 
the SPL of a short passage. He cncouraged listeners to adjust their monitor 
gains to reproduce the "natural* SPL which arrived at the tr.icrophone. I used 
to second-guess Wiener by first adjusting monitor gain by ear. and then 
chccking against Wiener’s number. Each time. 1 found rny monitor gain was 
within t dBofWieners recomniendation.Thus demonstratingthat the natural 
SPLÍ8 desirable for attentive. foreground listeners. 

5 One of my first lessons in the inaccuracy of the VU meter was in 1972, when I 
heard William Pierce. voicc of the Boston Symphony. clearly and distinctly in 
the noisy control room at Ghannel 24. yet he hardly moved the needle. The 
trained operator must use his ears and learn how to interpret this instrument. 


Chapler 15 196 



CHaPTer 16 

Analo g 1 and 
Digital 
Processing 


The mastering engineer must recognize when a 
recording is so good that the interests of the client 
are best served simply by leaving it alone. And there 
are recordings for which so little work is needed that 
the gains due to processing would not warrant the 
losses due to the same processing! For although 
equipment is getting better, there is no such thing 
as a transparent audio processor. This chapter is 
about how we measure and interpret performance, 
as there is an interaetion between objective 
degradation and subjective improvement. Let's take 
a journey into the twilight zone between the 
objective and the subjective. 

I. The Ironies of 
Perception vs. Measurement 

Although we’ll be usingtest measurements, we 
must remember that each single measurement only 
provides a small part of the picture. An audio 
processor is like an object inside a house with no 
doors, only a number of small Windows thatyou can 
peer into. By looking at the object through each 
window’s unique angle we can find out more, and 
add up the clues. but we can never be totally sure of 
what we are seeing, and must always leave open the 
possibility that there may be some aspect we eannot 
see, some mystery as to why this equalizer sounds 
"good” and this other one sounds "bad.” 

For example, here are a couple of "objective” 
measurements that just don’t add up! 

What Makes it Sound Bright? 

I’ve discovered a digital filterthat measures 
"dull” but sounds bright! The TC Electronic System 
6000 lets the user choose between different low- 


’97 


pass filters íor the A/D and D/A converters. Some of 
the filters roll off significantly above 16 kHz (at 44.1 
kHz sampling), so you’d think they would sound 
dull. But instead, to my ears, the 16 kHz filters called 
Natural and Linear sound more opea and olear than 
the particular 30 kHz filter called Vintage. However, 
there are other converters whose filters extend to zo 
kHz and whieh sound even more open than the TC’s 
Linear filter. So measured bandwidth cannot tell the 
whole psychoacoustic stoiy. We look into the audible 
effects of filtering in Chapter 18. 


The Fallacy of Typical Weighting Curves 

We have equipment in our studio whose noise 
floor mensures as low as -1 zo dBFS to as high as -50 
dBFS (after A/D conversión). However, much of this 
equipment is perceptually quiet: if I have to put my 
ear up to the loudspeaker to hear the hiss, then I 
consider it insignificant. Interestingly, the 
weighting methods 1 by whieh converter manufac- 
turers commonly measure noise bear little 
relationship to human perception. One particular 
convener whose A-weighted noise floor is -108 
dBFS sounds significantly quieterthan another 
convener whose A-weighted noise flooris —115 
dBFS! The reason is that the often-cited, A- 
weighted curve does not adequately consider the 
ear’s greater sensitivity in critical bands. It turns out 
that the converter whieh measures better (A- 

Weighted) 
produces signifi¬ 
cantly more energy 
circa 3 kHz, where 
the ear is most 
sensitive, and the 


'Never tumyour back on digital.' 

—Bob Ludvig. 


A-weighting filter does not take into account the 
significance of this critical band. To be psychoa- 
coustically accurate, noise measurement standards 
should adopt a curve closer to the measured noise 
floor of the human ear, such as the 9^“ order curve 
used by some of the best-soundingdithers (see 
Chapter 4). This curve is called ”F” weighting. 2 

There are many other areas in whieh traditional 
measurements do not correlate with what our ears 
tell us, particularly in the evaluation of low bit rate 
coding systems. These systems measure quite well 
with standard techniques. but once the ear has been 
trained to heartheir errors, we can easily identify 
artifaets we’ve never heard before with analog 
technology: described by some as chirping, orspace 
monkeys. Let’s see if we can objectively find out why 
some analog and digital processors sound better 
than others. Just remember that measurements look 
at an object through a few narrow Windows, and 
there may be a different, or better, explanation for 
sound quality than what I’ve come up with. 

II. Measurement Tools We Can Use 
While Mastering 

FFT Measurements 

FFT stands for Fast Fourier Transform. To really 
learn how to interpret (and not misinterpret) an 
FFT requires a college-level engineering course, 
and although I cannot claim to be such an expert. I 
have learned just enough to be dangerous! High- 
resolution FFT analysers, such as SpectraFoo™, are 
very reasonably priced, thanks to the exponential 
increase in CPU power and they provide an essential 
earlywaming system, a protection from the 


Chapter 16 198 


vicissitudes (bugs) of digital audio. Nevertumyour 
bucle on digital , says Bob Ludwig, or as I say, you're 
only one mouse click awayfrom disaster It’s a whole 
new world based on software designed by fallible 
human beings. 

FFT for Music 

Figure C 16-01 in the Color Píate section shows 
SpectraFoo inaction duringa CD masteringsession. 

At the middle top is a bitscope, currently 
skowing 16 (and only 16) active bits, an indication 
that the dither generator is probably doing its job. 
Tais bitscope can reveal if some digital device is 
malfunctioning, since one of the symptoms of a 
disfunctional processor is to toggle unwanted bits, 
or hold some bits steady when there is no signal. 
Bitscopes can also show if there are any unwanted 
truneations caused by defective or misused 
processors. However, the bitscope is only one of the 
small Windows we can look through; it can easily 
miss problems, or seem to indícate problems which 
require further interpretation. For example, some 
equalizers produce idle noise when the music goes 
tosilence. This can be perfectly normal, but will 
showup on the bitscope as activity. Togglingthe 
equalizer in and out while observing the bitscope 
will ascertain if that is the source of the problem or 
some other anomaly in the signal chain. 

At top right is a stereo position indicator. which 
is frozen at a moment when the information is 
slightly right -heavy. At left is a meter that conforms 
tothe K-14 standard (see Chapter 15). The meter 
shows the hottest moment of a rather hot R&B piece 
(which I would have preferred to reduce, but the 


client desired it this hot!). For the record, this 
material was monitored at -8 dB, which really makes 
it K-12 material. Just belowthe bitscope is a 
correlation indicator, revealing that the material is 
signifieantly monophonie. 1 prefera correlation 
indicator to an oscilloscope; meter deílections 
closer to the center of the scale indicate less 
correlation from channel to channel and likely a 
larger or more spacious stereo image. However, I 
always use my ears to confirm the image is not too 
'vague” and perform a mono (folddown) test to 
make sure the sound is mono-compatible. 

At mid-screen is the spectragram, showing 
spectral intensity over time. This can be useful to 
identify the frequencies of problem notes, or simply 
to entertain visitors! At bottom is the spectragraph, 
whose general rolloff shape gives a vague idea of the 
program’s timbre (though most times I disregard 
the spectral displays, since the eye candy of the 
visual display distraéis our aural senses). 

Figure C 16-02 in the Color Plates shows 
SpectraFoo during a pause in the music, with only 
the bottom four bits toggling, confirmingthat the 
dither is working correctly, since dithers which use 
heavy noise-shaping exercise several bits. Note that 
the bitscope shows four bits toggling (since dither is 
random, in this snapshot, bit 15 is at zero) and that 
the spectragraph shows the curve of the dither 
noise. which can be identified by its shape as POW- 
R type 3 or a similar 9^ order curve. Using this 
analyzer, you can often determine the type of dither 
used by the mastering engineer on recorded CDs. 


199 Analog and Digital 
Processing 


The level meters had not decayed fully when this 
shot was taken. The correlation meter fluctuates 
very slightlv near the meter’s center, showing that 
the dither is uncorrelated between channels 
(random pkase). I always glanee at this display at the 
beginning and end of the program, to make sure no 
bugs or patching errors have crept in. I carry a 
SpectraFoo umbrella even if it’s not raining! 

II. MeasurementToolsto Analyze 
your Equipment 

Let’s sort out what happens beneath the knobs. 
As in geometry, the shortest distance between two 
points is a straight line, so too in audio — both 
digital and analog —the cleanest signal path 
contains the fewest components. The converter 
used to be the most degradingpiece in the studio, 
but although they have greatly improved in recent 
years, we should still avoid extra conversión 
whenever possible. For analog tapes, it's best to do 
all the analog processing on the way to the first and 
only A/D conversión. But these days mixes are often 
on digital tape, and as there are a lot of desirable 
analog processors which the mastering engineer 
may prefer because they sound more organic than 
their digital equivalents, the tonal benefits of analog 
processing might outweigh the transparency losses 
of an extra conversión.’ The best defense is a good 
offense, and it is possible to reliably measure signal 
below the noise with an FFT analyzer. An FFT can 
confirm if a digital processor is not truly bypassed 
when it says bypass, which can be pretty deleterious 
(see Chapter 4). Jitter (see Chapter 19) is irrelevant 
to FFT analysers, which strictly look at data. 


Even though the analyzer can only examine 34 
bits (the limitation of the AES/EBU interface), it can 
measure distortion 40 dB below the 24-bit noise 
floor! This is because Spectrafoo is a 64,-bit floating 
point system. So we can compare the distortion of 
processors which trúncate at the 24th bit versus 
others which use 48 bits or so internally and then 
dither up to 34 bits. Whether we can hear these 
differences is a different question. Psychoacoustician 
J. Robert Stuart has demonstrated that we can hear a 
24-bit truncation in an 18-bit system. The ear’s 
dynamic range is approximately 20 bits (12,0 dB), 
but this varíes with frequency. At certain 
frequencies we can even hear below o dB SPL! 

How Many Bits is Enough? 

In color píate Figure C16-03, we compare 16, 20, 
and 24-bil flat-dithered noise. 3 The levels of all the 
"bins” add up, so at ib hits, the curve which looks 
like it rides at approximately -124 dBFS (level of 
individual bins) totals to an RMS level of about -91.2 
dBFS RMS, the theoretical limit of a properly- 
dithered 16-bit system. But discrete signáis at some 
frequencies can be heard as low as -115 dBFS in a 
properly-dithered 16-bit system, below which they 
are buried in the noise. Psychoacoustically, for the 
vast majority of popular and classical music, 16 bits 
properly done are just enough to do the job right. 

But as soon as we post-produce, copy, process and 
change gain, we accumulate noise and need profes- 
sional headroom, or perhaps we should cali it 
footroom* since the top, at o dBFS, is a constant. 

Psychoacousticians studying the limits of the 
human ear have determined that 20-bits is enough 

• And losses can be minimized usingupsaniplingísee Chapter 1). 

I This is a made-up word. not an official term! 


Chapter 16 200 



for good A/D and D/A performance. Anything more 
is just gravy, and it’s veiy rare to find a "24-bit” 
converter with better than 18-20-bit noise level. For 
processing, however we need the additional 
J'ootroom, better than 24 bits, because the 
frequency-content of digital distortion is far more 
annoying to the ear than analog distortions which 
are much louder. This is because distortion 
created during digital processingyields harmonic 
componente which beat against the sample rate, 
produciiigdissonant inharmonie beat or 
intermodulation producís. For purist processing, 
we may need as much as 48 to 72 bits, especially for 
extreme gain changes, complexfiltering, 
compression, orto avoid cumulative distortion when 
cascadingprocesses. It’s a myth that there’s no 
generation loss in digital processing; little by little, 
bit by precious bit, souiid sufFers with eveiy DSP 
opera tion. 

Figure C16-04 in the color plates shows the noise 
floor of a popular dither called POW- R type 3 at 16 - 
bit (red trace). For reference, we show the noise of 
fíat 20-bit dither (orange). and 24-bit dither 
(green). POW-R's shape is designed to maximize 
performance by keeping the noise at or near the 
ear’s low-level sensitivity at various frequencies. 
POW-R dither reaches 20-bit performance in the 
critical upper midrange (circa 3.5 kHz) where the 
ear is most sensitive. Thus, much of the low level 
ambience and reverberation that would have been 
masked is revealed, even with 16-bit reproduction. 
This performance can only be achieved by recording 
at a longer wordlength to begin with, as noise 
aecuinulates and the SNR gets slightly worse when 


you add final dither to the processed source. 

Analog versus Digital Processing 

Cheap versus Good...ls it Really Accurate? 

Many people have argued that the reason we 
notice harshness in some digital recordings is that 
digital audio recording is more accurate than analog. 
Their claim is that the accuracy of digital recording 
reveáis the harshness in our sources, since digital 
recording doesn't compresa (mcllow out) high 
frequencies as does low speed (15 1 PS) analog tape. 
Accuracy, they say, is why we have regressed to tube 
and vintage microphones. But I say this is only a 
half-truth, since most of these arguments come 
from individuáis who have not been exposed to the 
sound of good digital recording equipment, which is 
not only accurate, but can even be u>arm andpretty. 
Cheap digital equipment is subject to edgy sounding 
distortion which can be caused by sharp filters, low 
sample rates, poor conversión technology, low 
resolution (short wordlength), poor analog stages, 
jitter, improper dither, clockleakage in analog 
stages due to bad Circuit board design and many 
others, suchas placing sensitive A/D and D/A 
converters inside the same chassis with motors and 
spinning heads. It takes a superior power supply 
and shielding design to make an integrated digital 
tape recorderthat sounds good; compare the sound 
of an inexpensive modular digital multitrack 
(MDM) with the Nagra Digital recorder—4 veiy 
expensive tracks versue 8 cheap ones. 

When it comes to processing, numeric 
precisión is also expensive, even though it’s all 
software. Numeric imprecisión in digital consoles 


201 



MYTH: 

It's a digital 
processor, so 
there’s no 
generation loss. 


Analog and Digital 
Processing 





MyTH: 

It's a Digital 
Consolé. It must be 
better than my oíd 
analog model! 

I_I 


Chapter 16 


produces problems somewhat like noise in noise in 
analog consoles, but there is an important 
difference: noise in analog consoles gradually and 
gently obscures ambience and low-level material 
and usually does not add distortion at low levels. 
However, numeric imprecisión in digital consoles 
causes quantization errors (which increase at low 
levels) destroying the body and purity oí an entire 
mix, creating edgy, colder, sound, which 
audiophiles cali digititis. Since digital consoles do 
not make sound warmer, depending on the quality 
of their digital processing—and the number of 
passes through that circuitry—it might be better to 
mix through a high- 
quality analog consolé. 

Even though good 
digital eqnipment is 
getting cheaper at an 
exponential rate, it is 
still expensive to 
produce excellence in 
digital recordings. That’s why analog tape and 
analog mixing remain very much alive at this point 
inthe 2i st century. 

Two Fine Equalizers, One Analog, One Digital 

Inmy opinión, much inexpensive tube 
equipment is overly warm, noisy, unclear and 
undefined, and the commonuse of "fuzzy" analog 
equipment to cover up the problems of inexpensive 
digital equipment is a band-aid, not a cure for the 
loss of resolution. Not many people have been 
exposed to recent audiophile-quality tube 
equipment, and only the best-designed tube 
equipment has quiet, clear sound, tight (defined 


bass), is transparent and dimensional, yet still 
warm. Audiophiles feel a well-designed tube circuit 
can be more linear and resolving 4 than a low-cost 
solid state circuit. 1 certainly feel I hear more 
through some amplifiers than others. Modern-day 
tube designers often make innovative use of low- 
noise regulated power supplies on filaments and 
cathodes, a practice which was impractical in the 

5 0 ’ 8 - 

Figure C16-0S in the Color Plates section shows the 
low distortion and noise performance of a 
well-designed, popular state-of-the art analog tube 
equalizer, the Millennia NSEQ-a (red trace). For 

rcfcrcncc, 20 and 
24-bit noise are 
shown in blue and 
green, respectively. 
Notice that the tube 
noise of the NSEQ is 
aboul 10 dB greater 
than 20-bit, makingit 
a virtual 18-bit analog equalizer. However, this 
performance is dependent on the analog gain 
structure used. Ifyou drive the equalizer harder, its 
noise floor will be lower compared to máximum 
signal, and distortion may or may not be a problem. 
Since the Millennia’s clipping level is around + 3 ^ 
dBu, it may be perfectly legitímate to drive it wíth 
nominal levels of +10 dBu or even higher, provided 
the source equipment doesn't overload! Yet even 
with nominal levels of o dBu as was used for this 
graph, this tube equalizer is extremely quiet. Its 
noise is inaudible at any reasonahle monitor gain 
unless you put your ears up to the speaker. 


"Audio processing is the art of 
balancing subjective enhancement\ 
against objective degradation. ” 


— Bob Olhsson. 


202 




demonstrating that noise-flooris probably the least 
of ourworries. 1/2" 3 o IPS 2-track analog tape has 
even higher noise, but no one complains about it for 
popular music. 

For this FFT, we set up a D/A converter, feeding 
the NSEQ and then an A/D and the FFT. A digitally- 
generated 1 kHz -6 dBFS 24~bitdithered sine wave 
feeds the D/A. We adjust converter gain so o dBFS is 
+18 dBu, and boost the equalizer about 6 dB, till just 
belowA/D clipping. The equalizer is coastingat this 
level, since it’s around 19 dB below its clip level! If 
you are lookingfor extreme "tubey” effeets, you can 
drive the equalizer even harder, and also realize a 
greater SNR, provided the converters can handle the 
hotter level, certainly the equalizer can. 

Notice that the equalizer’s distortion is 
dominated by second, third, and fourth harmonios, 
which tend to sweeten sound. For comparison, in 
yellow is the performance of the superb Z-Systems 
digital equalizer, dithered to 24 bits, boosting 1 kHz 
5.8 dB with a Q of 0.7. Its harmonio distortion 
performance is textbook-perfect (no visible 
harmonios onthe FFT). Some engineersuse the 
word "diy” to describe the sound of a component 
that has little or no distortion. Looking through 
other "windows” we find that harmonios are far 
from the only sonic differenees between these 
pieces of gear. Tubes, power supplies and 
transformers can loosen the bass, which can 
sometimes be desirable; the digital equalizer retains 
the tightness of the bass;* the digital and analog 
equalizer’s curves are also different, though the ZQ- 
2 does a nice job of simulating the shapes of gentle 

* Since digital equalizers don't soften the bass like some tube units. you may wish 
to “loosen" the bass viih compression or some other tool. 


analog filters. Equalizer curve shape and phase shift 
probably make up other areas of delicate sonic 
difference between models of equalizers. 

The premium price of both the ZQ-2 and the 
NSEQ reinforce my point that high-quality analog 
or digital recording is expensive. At the time of this 
writing, it will be a number of years before there’s 
enough power in a typical Computer plug-in to come 
up to the quality of the best outboard processors. 

"Nasty” Digital Processors 

Truncation distortion can be fairly "nasty.” 

For example, in Figure C16-06 of the Color Plates 
section, we compare the analog Millennia NSEQ 
(orange trace) versus the digital Z Systems set to 
trúncate at 20 bits, no dither (black trace). 

Don’t try this at home! 1 think there are better 
ways to add grunge than turning off the dither. Much 
of the ambience, space, and warmth of the original 
source have been truncated, lost forever, converted 
to low level grunge (severe inharmonic distortion 
and noise). Even a small amount of non-harmonic 
distortion can be bothersome. Which sounds better, 
an analog processor with a smooth but higher noise 
floor, plus second and third harmonic distortion, oí¬ 
an undithered digital processor with a lower average 
noise floor plus inharmonic distortion? 

Poorly-implemented digital compressors 
produce severe inharmonic distortion, which is 
without integer relationship to the fundamental. 
Figure C16-07 in the Color Plates compares two digital 
compressors, both into 5 dB of compression with a 
10 kHz signal. 


2 o 3 Analog and Digital 

Processing 



In orange is a single-precisión, non-over- 
sampling compressor, and in black a double- 
sampling compressor implemented in 40-bit 
floatingpoint. Note the single-precisión compressor 
produces many non-harmonic aliases of the 10 kHz 
signal, especially in the critical midband. Nasty- 
sounding first-generation compressors are still 
common in low-cost digital consoles and DAW 
plugins. It takes a lot of processing power to double- 
sample. I'm convinced that the proliferation and 
misuse of cheap digital processing has degraded the 
sound quality of much recently-recorded music. 

The Magic of Analog? 

Static distortion measurements don’t explain 
every reason why some compressors sound excellent 
and others hurtyour ears. There are analog 
processors which are so magical that though they 
are not transparent, they add an interesting and 
excitingsonic character to music, or to pul it 
anotherway, their subjective cure is betterthan their 
objective disease. Analog tape recording is a perfect 
example of this type of process; measured 
objectively it’s noisy and distorted. but subjectively 
it can kick ass! If psychoacoustic research had been 
a bit more advanced on the audible effects of 
masking distortion and noise, then perhaps we may 
not have pursued this expensive search for 144 dB 
extremes. For example, the noise floor of the Sony- 
Philips DSD system is not particularly special (about 
120 dB in the audible band), but it sounds excellent, 
indieating that low-noise musí not be our only goal. 
We may even conclude that part of the good sound is 
due to masking; maybe -120 dB is just enough to 
cover the ugly parts of the distortion of even some of 


our best analog and digital gear. Inaddition, noise- 
free recording media can be very sterile-sounding 
because all the nits and cracks and disrortions 
caused by the musicians and their amplifiers are 
completely revealed by the quiet media. So, 
sometimes, adding extra noise can be more 
beneficial to the music than working noise-free. 
Perhaps one of the many reasons why analog tape 
sounds more musical to many people...noise can be 
ver) euphonic. We should certainly experiment with 
noise-masking and make our decisions on what is 
best for the music. [Please see sidebar, Clarity or Fuzz.] 

1 think that many classic analog compressors’ 
warm, fatyet clear sound signatures come from a 
unique combination of attack and release character- 
istics, which may be emulated in a digital processor. 
There are some plug-ins which emulate classical 
analog compressors but to my ears they do not come 
up to the job; I think they will get better over time 
whenthe cost of DSPgoes down. Currently, plug-in 
designéi s are forced to minimize the DSP load of 
their processors or users complain they can’t fit a 
plug-in on eveiy channel strip (as if this is 
desirable). Certainly the Weiss digital compressor 
does not sound digital, so we know that it can be 
done with programming skill and expensive DSP. 

An Analog Simulator-Pick your flavor of grunge 

Figure C16-0S in the Color Plates compares the 
NSEQ to the CranesongHEDD-192, a digital analog 
simulator of excellent sound quality. 

The Cranesong (blue trace) has been adjusted to 
produce a remarkably similar harmonic structure to 
the NSEQ. For this graph, its levels have been 


Chapter 16 204 


purposely set to produce more distortion than the 
Millermia was producing. Amazingly, the ear thinks 
it’s hearing an excellent analog processor without 
anyimagingor resolution loss. But the low-level 
grunge at the bottom of the picture looks mighty 
suspicious; looking through this "window” you 
might think the Cranesong was truncating 
important information. But two important factors 
ameliorate: First, the Cranesong’s grunge is about 
12 dB lower than that of a truncated devíce and thus 
is likely masked by the noise and the euphonic 
harmonios. Secondly, the HEDD has a unique 
sumrmng internal architecture that does not alter, 
trúncate or recalculate the original source signal. 
The Cranesong clones the original source and sends 
that to its output, while mixing in the calculated 
distortion, thereby largely preservingthe ambience 
and space of the original. The low level distortion in 
the figure is part of the additive distortion signal 
and not a result of recalculations to the source. In 
other words, only the distortion is distorted! We 
tookthis measurement first at 4,4,.i kHz-, at 88.a and 
96 k. As you can see in the two figures on the next 
page, at 96 k the low level grunge is virtually gone, 
and the Cranesong’s distortion is even cleaner, if 
that’s not a contradiction in terms! 

Cooking Better Sound—Naturally 

There are certain analog consoles whose 
character is highly prized hecause they add spice, 
dimensión and even punch to a mix. One ñame that 
comes to mind is API, which to my ears has an 
excellent combination of desirablelinearities (like 
headroom and bandwidth) and nonlinearities. I 
think the subtle "grit” in their discrete opamps 


could even be slight intermodulation distortion, 
which does just the right thing for rock and rollyet 
is subtle enough for jazz and classical depending on 
howyou drive the stages (a matter of taste). I think 
the transiórniers add some punch or fattening via 
saturation and z n< ^ and 3 r< ^ harmonio distortion as 
well as some upper harmonios and a touch of phase 
shift (which could add some dimensionality). 

Our role as mastering engineer is like that of the 
master chef who knows just how much and what 
kind of spice is useful to 
add pizzazz without 
overcooking or spoiling 
the flavor. By the middle of 
our careers we have 
eollected a sizable analog 
and digital spice rack! The 
Cranesong can mimic 
three types of naturally- 
occurring analog 
distortion, called Triode, 

Pentode and Tape. The 
triode control adds a pinch 
of salt, puré second 
harmonic, which, being 
the octave, is quite subtle, 
almost inaudible with 
some music. It can clear up 
the low end by adding 
some definition to a bass, 
but it can also thin out the 
sound too much. The 
pentode is extremely 
versatile; it provides both 


Clarity or Fuzz, which is best? 

There’s nothing wrong with using fuzz if it produces the 
right esthetic result. With high-resolution digital 
recording, tube equioment can add a nice flavor. 

Or, it can be used as a useful cover-uo, afuzzyband- 
aid. A client once told me, “Bob, your mastering is so 
much clearer than the mix, l’m starting to hear all the 
mistakes!” yes, high-resolution processing revealed more 
and more of the source, but this carne at a price, all the 
warts were revealed. I solved the problem by fuzzing up 
the sound slightly with some delicate tapestyle 
harmonios. 

For if the performance is not the absolute best, or the 
mix is not wonderful, or the sound is just better when it’s 
notperfectly clear—then fatness, masking fuzz, or analog 
distortion magic may be just the right approach for the 
music. In mastering I usually prefer to accomplish this by 
first passing the signal through the highest resolution 
electronics, which add little or no distortion, and then 
add a touch of the fuzzy sauce with a selectively fuzzy 
component or a noisy dither. This approach is methodical, 
controllable, and reversible. 

Clearly, artful use of noise can mask and therefore 
ameliorate some low-level distortions. Ironically, digital 
recording’s super low-noise may be its greatest enemy. 


zoy Analog and Digital 

Processing 





Comparing Cranesong HEDO 192 in 
Pentode mode at two different 
sample cates with a 10 kHi -15 
óBFS test tone. 

it tnp, 44. i kHz SR, at bnttom, 

96 kHz. Note the different 
frequency scales since the higher 
sample cate displays harmcnic 
frequencies of the audio signaI up 
to 48 kHz. 


soit and pepper. At lower levels it adds third and fifth 
harmonics, which are dangerously seductive, 
producing a unique presence boost and brightness 
with little grunge or digititis. especially at 96 kHz SR 
(pictured). At higher levels, additional odd 
harmonics add grit and some fatness, like an 
overdriven pentode tube—a Marshal amplifier in a 1 
U rack-mount box! Past the fifth, subtle amounts of 
seventh and ninth harmonics add a sometimes 
desirable "edge.” 

The Cranesong’s tape control is the sugar, 


which when mixed in, can sweeten the 
pentode pepper, yielding flavors from red 
toyellow, green or JalapeñolThe 
celebrated third harmonic (an octave plus a 
fifth) sweetens and fattens the sound, 
much like analog tape. Tape also produces 
the fat sound of analog tape, which helps to 
"glue” a mix together. Tape can help 
digitally-mixed sources that may be well- 
recorded but miss some of that "rock and 
roll fatness.” The control produces largely 
second and third harmonic distortion, but 
as it’s advanced, some additional higher 
harmonics, emulating analog tape 
performance. Too much sugar gives slow, 
muddy molasses, a rarely desirable 
quantity, but available ifyou need it. But 
just a light amount can act as a sweet- 
sounding bandaid to ameliorate truncated 
or edgy recordings. Regardless, space and 
depth have been permanently lost if there 
was truncation prior to the use of the 
Cranesong.' No one is sure why, but critical 
listeners have observed that adding delicate 
amounts of harmonic distortion in just the right 
proportion appear to enhance the depth and clarity 
in a recording. The trick is to know the exact 
amount. 5 

Single Precisión, Double Precisión, or Floating Point? 

First-generation digital processors gave digital 
Processing a bad ñame. But single precisión 24-bit 
processors are goingthe way of the Dodo. at least in 
respectable audio equipment. All things being equal 


Though Digital Domain's K-Stereo procesa does a prett) good job of restoring 
that lost ambience. 


Chapferió 206 


























































































































































































































































(and they never are) 32 -bit floating point 
processors are generally regarded as inferior- 
sounding to 48-bit (double-precision fixed), and 
40-bit float. Some newer floating-point devices, 
such as the software program ChannelStrip by 
Metric Halo, work in 64-bit andhave impressively 
low measured distortion. However, one designer, Z- 
Systems, has produced a 32-bit floating point digital 
equalizer using proprietaiy distortion -reducing 
techniques that sounds very good and measures as 
well as sorne other equalizers using longer 
wordlengths. Ultimately the skill of the designer 
determines how nice the device sounds. The 
mathematics involved are not trivial, and the 
designer’s choice of filter coefficients can make as 
much difference as his choice of wordlength. 

Figan C16-09 in the Color Plates shows that with a 
single precisión processor, even a simple gain boost 
can ruinyour digital day. A dithered 24-bit 1 kHz 
tone at - n dBFS is passed through two types of 
processors. each hoosting gain by 10 dB. The 
distortion of the single precisión processor (red 
trace) is the result of truncation of producís below 
the 24th bit. Nevertheless, the highest distortion 
product, at -142 dBFS, is extremely low. 1 believe 
th esound of a single 24-bit truncation may not be 
audible, but cumulative truncation adds enough 
inharmonic distortion to become annoyingto the 
scnsitive ear. In blue we compare the perfectly olean 
output of a 40-bit floating point processor which 
dithers its output to 24 bits. I measured similar 
performance with a 48-bit (double precisión) 
processor and 32-bit floating point processor, 
which both dither to 24 bits. 


Double Sampling? 

The most advanced digital equalizers and 
dynamics processors use double sampling 
technology, which means that the internal sampling 
rate is doubled to reduce aliasing distortion. High- 
quality linear phase filters are used inthe internal 
sample rate converters. I’m not certain this has 
audible meaning for equalizers, 6 but dynamics 
processors benefit because non-linear processing 
generates severe aliases of the sampling rate, and 
the higher the sample rate, the less aliasing. 

Figure CU-10 in the Color Plates compares two 
excellent-sounding digital dynamics processors, the 
oversampling Weiss DS1-MK2, which uses 40-bit 
floating point calculations, and the standard- 
sampling Waves L2, which uses 48-bit fixed point. 

To compare apples to apples, both processors 
are limiting by 3 dB, with the Waves in red, and the 
Weiss in green, set to iooo:t ratio. Note the 
oversampling processor exhibits considerably lower 
quantization distortion. However, the switchable 
safety limiter of the Weiss, which is not 
oversampled, produces considerable alias distortion 
even at 1 dB limiting (orange trace). At 88.2 kHz and 
above (not shown), the Weiss safety limiter and the 
Waves perform measurably better, and double 
sampling may not be needed. Thus there is consid¬ 
erable advantage of doing all our processing at 
higher rates, which moves the distortion producís 
into the inaudible spectrum above 20 kHz. Then, 
sample rate convert to 44.1 kHz during the last step, 
which filters out most of the high-frequency by- 
products. 


207 Analogand Digital 
Processing 


Despite the measured differences, the 
"wmdow” we’ve chosen, (steady-state sinewave 
performance) probably has little to do with the 
perceived performance of these two excellent- 
soundinglimiters. Because steady State 
measurements have little or no relationship to 
audible performance of limiters. I believe the key to 
the ear’s reaction is the duration of the limiting 
action. Intypicaluse, limiters go into gain reduction 
for a very short time. At limiting ratios of 1000:1, 
with instantaneous attack, and fast release, these 
processors produce only momentaiy distortion, 
shortcr than the human car's sensitivity to 
distortion (about 6 ms according to some 
authorities). But if a user overpushes a limiter so 
that it is working on the RMS levels of the material 
as well as the peaks, then its sinewave-measured 
distortion becomes audibly significant. 

Compressors, however, are different animáis, 
and double sampling is critical for them, because a 
compressor may be into gain reduction for a good 
percentage of the time. I feel that double-sampling 
contributes to the Weiss’s robust and warm sound 
when used as a compressor. While Reavy Metal 
recordings employ considerable distortion for 
effect, classically they employ analog processors for 
this purpose to avoid the inharmonic aliases of 
typical digital processors. 

Better Measurement Methods? 

It should be clear by now that we can easily 
measure simple phenomena that are probably too 
subtle to hear (such as single tone harmonic 
distortion near the 24 bit level). But we can hear 


(perceive) very complex phenomena that are 
difficult to describe with measurements (such as the 
sound quality of one equalizer versus another). 

What we will need to better describe such complex 
audible phenomena are psychoacou,sUcally-based 
measurement instruments that have not yet been 
invented. Current research and development of 
coded audio such as MP 3 (that benefits frora the 
ear’s masking) could lead to better noise and 
distortion analysers that can discrimínate between 
distortion we can and cannot hear. 

The Bongei^A Listening Test 

Since current steady-state sine-wave 
measurements are misleading when measuring 
nonlinear processors like compressors, a more 
effective measurement method is by listening: using 
the gonger aka bonger, originally developed by the 
BBC’s Chris Travis and available on a test CD from 
Checkpoint Audio (see Appendix 10). This test is a 
puré sine wave that modulates through various 
amplitudes, in order to exercise and reveal any 
amplitude non-linearities in the signal path. Just 
play the bonger through the device under test and 
listen to the output for noise modulation, buzz or 
distortion. 

Identity Testing—Bit Transparency 

Any workstation that cannot make a perfect 
clone should be junked. The simplest test is the 
identity test, or bit-transparency test. Set a digital 
equalizer to fíat and unity gain, then test to see if it 
passes signal identical to its input. Some people 
scoff at this test, since analog equipment almost 
never produces identical output. But the test is 


Chapter 16 2 08 


important, since digital equipment can produce 
egregious distortion as we have seen. The bit scope 
can aid in nuil testing-, it is quite likely that a device 
is bit-transparent if you selectively put in 16 bits, 
then 20, then 24, and get out the same as you put in. 
You can also watch a 16 or 20-bit source expand to 
24-bits when the gain changes, during crossfades, 
cr if any equalizer is changed from the o dB 
position. A neutral consolé path is a good indication 
of data integrity in a DAW. After the bitscope, your 
next defense is to perform some basic tests, for 
l.nearity, for distortion with the FFT, and finally, 
test for perfect clones (perfect digital copies). The 
nuil test cunfirms bit-for bit identity: Play two files 
at the same time, inverting the polarity of one and 
mixing the two together. There must be zero output 
orthe two files are not identical. Since designers are 
fallible human beings, you should carry out basic 
tests on your DAW for each software revisión. 

Choose your Weapon 

So, which to use, analog or digital processing? A 
fewyears ago, I didn’t like the sound of cumulative 
digital processing. 1 could tolérate a couple of the 
best-designed single-precisión units in series. 

After that, it was back to analog. 

If processing digitally, be aware of the 
weaknesses of the equipment. Until manufacturers 
adopt more powerful processors, and processing 
power catches up, limit the number of passes 
through any digital System. Each pass will sound a 
little bit colder even using 24 bit storage. A mix 
niade through a current-day digital consolé may 
or may not sound better than one made through a 


high-quality analog consolé, dependingon several 
factors: the number of passes or bounces that have 
been made, the number of tracks which are mixed, 
the quality of the converters which were used, the 
outboard equipment, and the internal mixing and 
equalization algorithms in the digital consolé. While 
no consolé equalizer currently has the power of a 
$6000 Weiss, economically it's a lot simpler to 
replicate a good equalization algorithm for 144 
channels than performingthe equivalent in analog 
hardware, so there is hope for the digital console’s 
future, when Silicon will be cheaper. 


And there's no 
turningback; 24- 
bit recording and 
high sample rates 
are taking over, 
and they sound 
bcttcr, so for 



The Source Quality Rule: Always start 
out with the highest resolution source 
and maintain that resolution for as 
long as possible into the processing." 


mastering we can 

choose from the best of several worlds, and we make 

our choices hy balancing the benefits and the losses: 

• (some) veiy transparent, low-noise, 
pure-sounding digital gear 

• (some) good-sounding, reasonably-transparent, 
low-noise analog gear that we can use to add a 
little sugar, salt, pepper, orspice, or simplyto 
prevent the sound from getting colder 

• a digital processor that simulates analog 
distortion or warmth. 


Why Is Good DSP So Expensive? 

Intellectual property is the most nebulous thing 
to a consumer. It’s easy to see why a two-ton 


209 Analog and Digital 

Processing 


Mercedes Benz costs so much, but the amount of 
intellectual workthat has gone into a one-gram IC is 
not so obvious. It can take five man-years to 
produce good audio software, created by individuáis 
with ten or more years of schooling or experience, 
Similarly, when the doctor takes ten minutes to 
examine you, prescribes a 10-cent pill and then 
presents you with a $100 invoice, rememberyou're 
paying for all that knowledge and experience. This 
doesn’t mean I’m against socialized medicine, I just 
want to re-emphasize the reasons why intellectual 
property and good DSP are so expensive. 

The Source-Quality Rule 

An important corollary of this discussion is the 
source-quality rule: Source recordings and masters 
should have higher resolution than the eventual release 
médium. Always start out with the highest 
resolution source and inaintain that resolution 
for as longas possible into the processing. When 
mastering, one consequence of this rule is to reduce 
the number of generations and copies, and if 
possible, go back one or more generations when a 
new proeess must be added or applied. 

This rule even applies when you’re making an 
MP 3 or other data-reduced final result. Consider a 
lossy médium like the (rapidly obsolescing) analog 
cassette. Dub to cassette from a high quality source, 
like a CD, and it sounds much better than a copy 
from an inferior source, like the FM radio, by 
avoiding cumulative bandwidth losses, as wider 
bandwidth sounds better. In other words, the higher 
the audio quality you begin with, the better the final 
product, whether it’s an audiophile CD, a multi¬ 


media CD-ROM, MP 3 , or a talking Barbie dolí. It 
may seem funny, but you'll never go wrong starting 
at 96 kHz/34 bit if the product is to end up on 44.1 
k/16 bit CD. Sample rate conversión should be the 
permltimate proeess, followed by dithering. 

In Summary 

Mastering engineers do not have to think about 
the meaning of life every time they perform their 
magic; many engineers simply plug in their 
processors, listen, and make musió sound better. 
But I also like to consider just why things sound 
better, because it helps me avoid problems that are 
not obvious at first listen, and also dream up 
innovative Solutions. I hope that this chapter has 
inspired you to dream up some innovations of your 
own! 


1 See the Appendix for references on ncise filters. Ironically. ali the standard 
noise - weightingfilters should be revised, because they have no relationship 
with human perception of very quiet devices such as A/D and D/A converters. 

2 And even then, the F-curve is an approximation, since the ear’s perception of 
noise is much more than just a frequeney response curve, as Jim Johnston 
pvplains- VInise should he measnred scparately in rach critica! han! and 
compared to the ear’s threshold for that critical band. 

3 Most of tbe SpectraFoo™ screenshotswcre taken at an FFT resolution of 3 ?K 
points ( 3 aooo "bins") with about 4 second average time and Hanmng 
weighting. The actual amplitude of details on an FFT depends on ita resolution. 
so FFTs are only directly comparable if the same methods are used 

4 The term resolving, when applied to the sound of tube circuits, is itself an 
unquantifiable audiophile subjective term. It’s fairto say that audbphile 
negative reactions to some ugly-soundingsolid-state circuits use inexact tenns 

such as reiolution and transpareney. which may be proved to be simply distri- 
butionof harmonics or differenccs infrequeney response. And maybe not! 

5 For the curious. K-Stereo and K Surround do not use harmonic distort ion to 
enhance cepth. They use other psychoacoustic principies. 

6 Although the makers of the double-sampling Weiss Equalizer. CM!. plugin, 
and the Audiocube feel that double sampling is important for equalizers. Some 
engineers like the sound of high frequeney curves that extend beyend 20 kHz. 
even if that is later cut off when the sample rate is halved at the output of the 
equalizer. And Jim Johnston (in corretpondence) States that when a digital 
filter has response extendí ng to half the sampling rate. it can produce some 
really odd and unexpccted frequeney responses, indicating that double 
samplingis important for such type of equalizers. 


Chapter 16 210 



CHapTer 17 


How to Achieve 
Depth and 
Dimensión in 
Recording, 
Mixing and 
Mastering 


I. Introduction 


I placed this acoustics lesson in the middle of a 
book on mastering because the creation of 
wonderful audio masters requives that some basic 
acoustic principies be understood. As we enter the 
era of surround recording and reproduction, many 
mix engineers are repeating their mistakes froni 
two-channel work—panpotting mono instruments 
to discrete locations, and thenadding múltiple 
layers of uncorrelated stereophonic reverb "wash” 
in a vain and misguided attempt to create space and 
depth. It’s important to learn how to manipúlate the 
surprising depth available írom 2-channel canvas 
before moving on to multi-channel surround. 

It amazes me how few engineers know how to 
fully use good ol’ fashioned 2-channel stereo. I’ve 
been making "naturalistic” 2-channel recordings 
for many years taking advantage of room acoustics, 
but it is also possible to use artificial means to 
simúlate depth, and there are many engineers 
working in the pop field who know how to do so. 
Learn to discern the audible difference between 
simple pan-potted mono, and recordings which 
simúlate or utilize the reflections l'rom nearby walls 
to create a real sense of depth. Without such 
knowledge, your recordings will tend to produce a 
vague, undefined image; the musical instruments 
will be obscured and unclear. 

Techniques here inelude using the Haas 1 efiect, 
particularly when implemented binaurally, use of 
delays and alteration of phase, more naturalistic 
reverberators, and understanding how to unmask 
via placement. Also be aware that well-engineered 


2-channel recordings have encoded ambience 
information which can be exti'acted to multichannel, 
and it pays to learn about these techniques. 

Depth Perception in Real Rooms 

Early Reflections versus Reverberation 

At first thought, it may seem that depth in a 
recording can be achieved simply by increasing the 
proportion of reverberant to direct sound. But the 
artificial simulation of depth is a much more 
complex process. Ourbinaural hearing apparatus is 
largely responsible for the perception of depth and 
spaee, decoding the various early reflections from 
nearby walls that support and strengthen the sound 
of musical instruments and voices. First, we must 
define the terms early reflections and reverberation. 
Early reflections consist of the part of the room 
sound within approximately the first 50-100 
milliseconds. There is a great deal of correlation 
between the direct sound and the early reflections; 
you can think of the early reflections as being 
attached to the direct sound. In a large and difíuse 
room, after about 100 milliseconds, enough wall 
bounces have occurred to make it impossibleto hear 
discrete bounces; this is the onset of random 
(uncorrelated) reverberation, which we can say is 
detached from the direct sound. That’s why it is the 
early reflections, even more than the reverberation, 
which largely affect our perception of the depth of 
the sound, giving it shapc and dimensión. The car’s 
decoding ability is such that a few simple well- 
placed echos actually solidify and clarify the location 
of the direct sound; this is why a simple, dead, 
panpotted mono source (without early reflections) 
is so hard to lócate precisely. 


Masking Principle/Haas Effect 

Recording engineers were concerned with 
achieving depth even in the days of monophonic 
sound. In those days, many halls for orchestral 
recording were deaderthan those oftoday. Why do 
monophonic recording and dead rooms seem to go 
well together? The answer is involved in two 
principies that work hand in hand: 1) The masking 
principie and 2) The Haas effect. 

The Masking Principie and Mono versus Stereo 
Recordings 

The masking principie says that a louder sound 
will tend to cover (mask) a softer sound, especially if 
the two sounds lie in the same frequeney range. If 
these two sounds happen to be the direct sound 
from a musical instrument and the reverberation 
from that same instrument, then the initial 
reverberation can appear to be covered by the direct 
sound. When the direct sound ceases, the 
reverberant hangover is finally perceived. This is 
why in mixing, we often add a small delay between 
the direct sound and the reverberation, it helps the 
ears to sepárate one from the other, reducingthe 
masking. 

In concert halls, our two ears sense 
reverberation as coming diffusely from all around 
us, and the direct sound as having a distinct single 
location. Thus, when music is perceived binaurally, 
there is less masking because the direct and 
reverberant sound come from different directions. 
However, in monophonic recording, the 
reverberation is reproduced from the same source 
speaker as the direct sound. and so we may perceive 


Chapter 17 V 2 


the room as deader than it really is, because the two 
sounds overlap directionally. Furthermore, if we 
choose a recording hall that is very live, then the 
reverberation will tend to intrude on our perception 
of the direct sound, since in monaural, both will be 
reproduced from the same location-the single 
speaker. 

This is one explanation for the incompatibility 
oi'many stereophonic recordings with monophonic 
reproduction. The larger amount of reverberation 
tolerable in stereo becomes less acceptable in mono 
due to the physical overlap. As we extend our 
recording techniques to a-channel (and 
multichannel) we can overeóme masking problems 
by spreading artificial reverberation spatially away 
from the direct source, achieving both a clear 
(intelligible) and warm recording at the same time. 
One of the first tricks that mix engineers learn is to 
put reverberation in the opposite channel from the 
source. This helps unmask the sound, but can 
produce an unnatural effect.* As we get more 
sophisticated, we discover that instead of hard- 
panningthe source and its mono echo or reverb 
return. using múltiple delays or stereophonic early 
reflections canyield a far more cohesive, natural 
eífect. The presence of the stereophonically-spread 
early reflections also serves to clarify the location of 
:he dry source. In a sophisticated stereo mix, 
engineers take advantage of variations on these 
diemes to produce variety and space in the 
recording. 

The Haas Effect 

The Haas effect can help overeóme masking. In 
general. Haas says that echoes occurring within 


approximately 40 milliseconds of the direct sound 
become fused with the direct sound. We say that the 
echo becomes ''one" with the direct sound, and only 
a loudness enhancement occurs-, this is what 
happens in a real room with the earliest wall and 
floor reflections. Since the velocity of sound is 
approximately one foot per millisecond, 40 
milliseconds corresponds to a wall that’s 2,0 feet 
distant (assuming a fíat wall perpendicular to the 
angle of the direct sound). 

A very important corollarv to the Haas effect 
says that fusión (and loudness enhancement) will 
occur even ifthe closely-timed echo comes from a 
different direction than the original source. 
However, the brain will continué to recognize 
(binaurally) the location of the original sound as the 
proper direction of the source. The Haas effect 
allows nearby echoes (greaterthan about 10 ms. and 
less than about 40 ms. delay) to enhance and 
reinforce an original sound without confusing its 
directionality. The máximum definition of the 
source’s directionality will occur using the longest 
delay possible that is not perceived as a discrete echo. 

The Magic Surround 

We can take advantage of the Haas effect to 
naturally and effectively convert an existing 2- 
channel recording to a 4-channel or surround 
médium. When remixing, place a discrete delay in 
the surround speakers to enhance and extract the 
original ambience from a previously recorded 
source! No artificial reverberator is needed if there 
is sufficient reverberation in the original source. 
Here’s how it works: 


3 >3 Depth and Dimensión 


Because of the Haas effect, when the delay and 
source are correlated (e.g., a snare drum hit) the ear 
fuses them, and so still perceives the direct sound as 
coming from the front speakers. But this does not 
apply to ambience because it is uncorrelated—the 
ear does not recognize the delay as a repeat, and 
thus amhience will be spread, dif'fused between the 
location of the original sound and the location of the 
delay (in the surround speakers). Thus, the Haas 
effect only works for correlated material; 
uncorrelated material (such as natural 
reverberation) is extracted, enhanced, and spread 
directionally. Dolby laboratories calis this effect the 
magic surround, for they discovered that natural 
reverberation was extracted to the rear speakers 
when a delay was applied to them. Dolby also uses an 
L-minus-R matrix to further enhance the 
separation. The wider the bandwidth of the 
surround system and the more diffuse its eharacter, 
the more effective the psychoacoustic extraction of 
ambience to the surround speakers. 

Haas In Mixing 

There’s more to Haas than this simple 
explanation. To become proficient ín using Haas in 
mixing, you can study the original papers which 
discuss the various fusión effeets at different delay 
and amplitude ratios. During mixing, remember the 
i foot per millisecond relationship, and see what 
happens with carefully-placed and leveled delays in 
the 12 to 40 millisecond range. You will discover 
that they can enhance an instrument’s claritv and 
position all due to psychoacoustics: the ear’s own 
decoding power. 3 In fact, Haas delays are far more 
effective than equalization at repairingthe sound of 


a drumset which was recorded in a dead room, for 
example. Furthermore, multiplying the delays until 
they simúlate the complex early reflections of real 
rooms can greatly improve our stereo mixing 
technique. More than a few delays is beyond our 
ability to do on a simple mixing board, and for early 
reflections we must use computerized simulations 
found in deviccs such as the TC Electronic, EMT, 
and certain models of Sony reverbs. The latest 
algorithm from TC, currently only available in the 
System 6000, is quite astounding. 

Haas In Mastering 

We often receive recordings for mastering 
which lack depth, spatiality and clarity because the 
mix engineer did not mix the early reflections or 
reverberation well enough or loudly enough. But 
since the mix has already been made, adding 
artificial reverberation can muddy the sound. This 
is why an ambience extraction technique should be 
employed instead. My K-Stereo processor, model 
DD-2, can enhance the depth of existing stereo 
mixes by extracting and spatially-spreading their 
inherent ambience. 

Haas’ Relationship To Natural Environments 

In a good stereo recording, the early correlated 
room reflections are captured with their correct 
placement; they support the original sound, help us 
lócate the sound source as to distance and do not 
interfe re with left-right orientation. The later 
uncorrelated reflections, which we cali 
reverberation, naturally contribute to the 
perception of distance, but because they are 
uncorrelated with the original source the 


Cliapter 17 214 


reverberation does not help us lócate the original 
source in space. If the recording engineer uses 
stereophonic miking techniques and a more lively 
room instead, capturing early reflections on two 
tracks of the multitrack, the remixing engineer will 
need less artificial reverberation and what little he 
adds can be done convincingly. 

Using Frequency Response to Simúlate Depth 

Another contributor to the sense of distance in 
a natural acoustic environment is the absorption 
qualities of air. As the distance írom a sound source 
increases, the apparent high frequency response is 
reduced. This provides another tool which the 
recording engineer can use to simúlate distance, as 
our ears have been trained to associate distance with 
high-frequency rolloff. An interesting experiment 
is to alter a treble control while playing back a good 
orchestral recording. Notice how the apparent 
front-to-back depth of the orchestra changes 
considerably as you manipúlate the high 
frequencies. 

Recording Techniques in Natural Rooms to Achieve 
Front-To-Back Depth 

Balancingthe Orchestra with only a few 
micophones (mininialist). A musical group is shown 
in a hall cross section (see diagram at right). Various 
microphone positions are indicated by letters A- F. 

Microphones A are located veiy cióse to the 
front of the orchestra. As a result, the ratio of A’s 
distance from the back compared to the front is very 
large. Consequently, the front of the orchestra will 
be much louder in comparison to the rear, and the 
amount of early reflections reaching the 


microphone from the rear will be far greater than 
from the front. Front-to-back balance will be 
exaggerated. However, there is much to be said in 
favor of mike position A, since the conductor usually 
stands there, and he purposely places the softer 
instruments (strings) in the front, and the louder 
(brass and percussion) in the back, somewhat 
compensatingfor the level discrepancy due to 
location. Also, the radiation characteristics of the 
horns of trumpets and trombones help them to 
overeóme distance. These instruments frequently 
sound closer than other instruments located at the 
same physical distance because the focus of the horn 
increases direct to reflected ratio. Notice that 
orchestral brass often seem much closer than the 
percussion, though they are placed at similar 
distances. You should take these factors into account 
when arranging an ensemble for recording. Clearly, 
we perceive depth by the larger proportion of 
reflected to direct sound for the back instruments. 


The farther back we move in the hall, the 
smallerthe ratio of back-to-front distance, and the 
front instruments have less advantage over the rear. 


h h h h h h í~i 


Back Front Crítícal 

of Stage of Stage Distance 


Depth and Dimensión 






At position B, the brass and percussion are only two 
times the distance from the mikes as the strings. 
This (according to theory) makes the back of the 
orchestra 6 dB down compared to the front, but 
much less than 6 dB in a reverberant hall, because 
level changes less with distance. 

For example, in position C, the microphones are 
beyond the critical distance—the point where direct 
and reverberant sound are equal. If the front of the 
orchestra seems too loud at B, position C will not 
solve the problem; it will have similar front-back 
balance but be more buried in reverberation. 

Using Microphone HeightTo Control Depth And 
Reverberation 

Changingthe microphone’s height allows us to 
alter the front-to-back perspective independently 
of reverberation. Position D has no front-to-back 
depth, since the mikes are directly over the center of 
the orchestra. Position E is the same distance from 
the orchestra as A, but being much higher, the 
relative back-to-front ratio is much less. At E we 
may find the ideal depth perspective and a good 
level balance between the front and rear 
instruments. If even less front-to-back depth is 
desired, then F may be the solution, although with 
more overall reverberation and at a greater distance. 
Or we can try a position higher than E, with less 
reverb than F. 

Directivity Of Musical Instruments 

Frequently, the higher up we move the mike, the 
more high frequencies it will capture, especially from 
the strings. This is because the high frequencies of 
many instruments (particularlv violins and violas) 


radíate upward as well as forward. The high frequency 
factor adds more complexity to the problem, since it 
has been noted that treble response affects the 
apparent distance of a source. Note that when the 
mike moves past the critical distance in the hall, we 
may not hear significant changes in high frequency 
response when height is changed. 

The recording engineer should be aware of how 
all the above factors affect the depth picture so he can 
make an intelligent decisión on the mike position to 
try next. The difference between a B+ recording and 
an A+ recording can be a matter of inches. 

Beyond Minimalist Recording 

The engineer/produeeroflen desires additional 
warmth, ambience, or distance after findingthe 
mike position that achieves the perfect instrumental 
balance. In this case, moving the mikes back into 
the reverberant field cannot be the solution. 
Another cali for increased ambience is when the hall 
is a bit dry. In either case, trucking the entire 
ensemble to another hall may be tempting, but is 
not always the most practical solution. 

The minimalist approach is to change the 
microphone pattern(s) to less directional (e.g., 
omni or figure-8). But this can get complex, as each 
patlern dernands ils own spacingand angle. 
Simplistically speaking, with a constant distance, 
changing the microphone pattern affects direct to 
reverberant ratio. 

Perhaps the easiest solution is to add ambience 
mikes. If you know the principies of acoustic phase 
cancellation, addingmore mikes is theoretically a 
sin. However, acoustic phase cancellation does not 


Chapter 17 216 


occur when the extra mikes are placed purely in the 
reverberant field, for the reverberant field is 
uncorrelated with the direct sound. The problem, of 
course, is knowing when the mikes are deep enough 
inthe reverberant field. Proper applieation of the 
3 to i rule 4 will minimize acoustic phase cancel - 
lation. So will careful listening. The ambience mikes 
should be back far enough in the hall, and the hall 
must be sufficientlv reverberant so that when these 
mikes are mixed into the program, no deterioration 
in the direct frequency response is heard, just an 
added warmth and increased reverberation. 
Sometimes halls are so dry that there is distinct, 
correlated sound even at the back, and ambience 
mikes would cause a comb filter effect. 

Assuming the added ambience consists of 
uncorrelated reverberation, then in principie an 
artificial reverberation chamber should accomplish 
similar results to those obtained with ambience 
microphones. In practice, however, this has to be a 
cualified yes, by assuming not only that the artificial 
reverberation chamber has a true stereophonic 
response and is consonant with the sound of the 
original recording hall, but also that the main 
microphones have picked up sufficient early 
reflections for the depth effect to be convincing. 
Artificial reverberation alone, being uncorrelated, 
will not help the imaging or produce a focused 
depth picture. 

What happens to the depth and distance picture 
of the orchestra as the ambience is added? In 
general, the front-to-back depth of the orchestra 
remains the same or increases minimally, but the 
apparent overall distance will increase as more 


reverberation is mixed in. The change in depth may 
not be linear for the whole orchestra since the 
instruments with more dominant high frequencies 
may seem to remain closer even with added 
reverberation. 

The Influence of Hall Characteristics on Recorded 
Front-To-Back Depth 

In general, given a íixed microphone distance, 
the more reverberant the hall, the farther back the 
rear of the orchestra will seem. In one problem hall 
the reverberation is much greater in the upper bass 
frequency región, particularly around 150 to 3 oo Hz. 
A string quartet usually places the cello in the back. 
Since that instrument is very rich in the upper bass 
región, in this problem hall the cello always sounds 
farther away from the mikes than the second violin, 
which is located at his right. Strangely enough, a 
concert-goer in tbis hall does not notice the extra 
sonic distance because his strong visual sense 
locates the cello easily and does not allow him to 
notice an incongruity. When she closes her eyes, 
however, the astute listener notices that, yes, the 
cello sounds farther back than it looks! 

It is therefore rather difficult to get a proper 
depth picture with a pair of microphones in this 
problem hall. Depth seems to increase almost 
exponentially when low frequency instruments are 
placed only a few feet away. It is especially difficult 
to record a piano quintet in this hall because the low 
end of the piano excites the room and seems hard to 
lócate spatially. The problem is aggravated when the 
piano is on half-stick, cutting down the high 
frequency definition of the instrument. 


VJ Depth and Dimensión 


The miking solution I choose for this problem is 
a compromise; cióse mike the piano, and mixthis 
with a panning position identical to the piano’s 
virtual image arriving from the main mike pair. I 
can only adda small portion of this closc mike 
before the apparent level of the piano is taken above 
the balance a listener would hear in the hall. The 
cióse mike helps solidify the image and lócate the 
piano. It gives the listener a little more direct sound 
onwhich to focus. 

Can ininimalist techniques work in a dead 
studio? Not very well. My observations are that 
simple miking has no advantage over múltiple 
miking in a dead room. I once recorded a horn 
overdub in a dead room, with six tracks of cióse 
mikes and two for a more distant stereo pair. In this 
dead room there were no significant differences 
between the sound of the minimalist pair. and the 
six múltiple mono close-up mikes! (The cióse mikes 
were, of course, carefully equalized, leveled and 
panned from left to right.) This was a surprising 
discoveiy, and it reinforces the importance of good 
hall acoustics and especially early reflections on a 
musical sound. In other words, when there are no 
significant early reflections, you might as well 
choose múltiple miking, with its attendant post- 
production balance advantages. 

Miking Techniques and the Depth Picture 

Coincident Microphones. The various simple 
miking techniques reveal depth to greater or lesser 
degree. Microphone patterns which have out of 
phase lobes (e.g., hypercardioid and figure-8) can 
produce an uncanny holographic quality when used 


in properly angled pairs. Even tightly-spaced 
(coincident) figure -8s can give as much of a depth 
picture as spaced omnis. But coincident miking 
reduces time amhiguity between left and right 
channels, and sometimes we seek that very 
ambiguity. Thus, there is no single ideal minimalist 
technique for good depth, and you should become 
familiar with changes in depth produced by 
changingmike spacing, patterns, and angles. For 
example, with any given mike pattern, the farther 
apart the microphones of a pair, the wider the stereo 
image of the ensemble. Instruments near the sides 
tend to pulí more left or right. Center instruments 
tend to get wider and more diffuse in their image 
picture, harder to lócate or focus spatially. 

The technical reasons for this are tied in to the 
Haas effect for delays of under approximately 5 ms. 
vs. significantly longer delays. With veiy short 
delays between two spatially located sources, the 
image location becomes ambiguous. A listener can 
experiment with this effect by mistuningthe 
azimuth on an analogtwo-track machine and 
playing a mono tape over a well-focused stereo 
speaker system. When the azimuth is correct, the 
center image will be tight and defined. When the 
azimuth is mistuned, the center image will get wider 
and acoustically out of focus. Similar problems can 
(and do) occur with the mike-to-mike time delays 
always present in spaced-pair techniques. 

Spaced microphones. I have found that when 
spaced mike pairs are used, the depth picture also 
appears to increase, especially in the center. For 
example, the front line of a chorus will no longer 


Chapter 17 21 8 


seem straight. Instead, it appears to be on an are 
bowing away from the listener in the middle. If 
soloists are placed at the left and right sides of this 
choras instead of in the middle, a rather pleasant 
and workable artificial depth effect will occur. 
Therefore, do not rule out the use of spaced-pair 
technicpies. Addinga third omnidirectional mike in 
the center of two other omnis can stabilize the center 
image, and proportionally reduces center depth. 

Múltiple Miki ng. I have described how 
múltiple cióse mikes destroy the depth picture; in 
general I stand behind that statement. But soloists 
do exist in orchestras, and for many reasons, they 
are not always positioned in front of the group. 

When looking for a natural depth picture, tiy to 
move the soloists closer instead of adding additional 
mikes, which can cause acoustic phase cancellation. 
But when the soloist cannot be moved, plays too 
softly, or when hall acoustics make him sound too 
far back, then one or more spot mikes must be added. 
When the cióse solo mikes are a properly placed 
stereo pair and the hall is not too dead, the depth 
image will seem more natural than one obtained 
with a single solo mike. 

To avoid problems, apply the 3 to i rule. Also, 
listen closely for frequeney response problems 
when the cióse mike is mixed in. As noted, the Uve 
hall is more forgiving. The cióse mike (not 
surprisingly) will appearto bringthe solo 
ínstrament closer to the listener. If this practice is 
not overdone, the effect is not a problem as longas 
musical balance is maintained, and the cióse mike 
levels are not changed duringthe performance. 


We’ve all heard recordings made with this discon- 
certing practice. Trumpets on roller skates? 

Delay Mixing. At first thought, adding a delay to 
the cióse mike seems attractive. While this delay will 
synchronize the direct sound of that instrument 
with the direct sound of that instrument arriving at 
the front mikes, the single delay line cannot 
effectively simúlate the other delays of the múltiple 
early room reílections surrounding the soloist. The 
múltiple early reílections arrive at the distant mikes 
and contribute to direction and depth. They do not 
arrive at the cióse mike with significant amplitude 
compared to the direct sound entering the cióse 
mike. Therefore, while delay mixing may help, it is 
not a panacea. To adjust the delay of the solo mike(s) 
properly. start with a delay calculated by the relative 
distarme betweeu the solo mike and the maiu mike, 
then focus the delay up and down in i ms. 
increments until the sound is most coherent and 
focused and the soloist sounds clearest. 

Influence Of The Control Room Environment On 
Perceived Depth 

At this point, many engineers may say, 'Tve 
never noticed depth in my control room!” The 
widespread practice of placing near-field monitors 
on the meter bridges of consoles kills almost all 
sense of depth. Comb-filtering, speaker diffraction 
and sympathetic vibrations from nearby surfaces 
destroy the perception of delicate time and spatial 
cues. The recent advent of smaller virtual control 
surfaces has helped reduce the size of consoles, but 
seek advíce from an expert acoustician if you want to 
appreciate or manipúlate depth inyour recordings. 


2>9 Depth and Dimensión 


Examples To Check Out 

Standard multitrack music recording 
techniques make it difficult for engineers to achieve 
depth in their recordings. Mixdown tricks with 
reverb and delay may help, but good engineers 
realize that the best trick is no trick: learn how to 
use stereo pairs in a good acoustic. Here are some 
examples ofaudiophile recordings I’ve made that 
purposely take advantage of depth and space. both 
foreground and background, on Chesky Records. 
Sara K. Hobo. Chesky JD155. Check out the 
percussion on track 3 , "Brick House.” Johnny Frigo, 
Debut of a Legend, Chesky JD119. Check out the 
sound of the drums and the sax on track 9, "I Love 
París.” Ana Caram, The OtherSide ofjobim, Chesky 
JD73. Check out the percussion, cello and sax on 
"Correnteza.” Carlos Heredia, Gypsy Flamenco, 
Chesky WOi^ó. Play it loud! And listen to track 1 for 
the sound of the background singers and handclaps. 
Phil Woods, Astor and Elis, Chesky JD146, for the 
natural -sounding combination of intimacy and 
depth of the jazz ensemble. 

Technological Impediments to 
Capturing Recorded Depth 

Depth is the first thing to suffer when 
technology is incorrectly applied. Here is a 
summaiy of some of the technical practices that 
when misused, or accumulated, can contribute to a 
boringly fíat, depthless recorded picture: 

• Multitrack and multimike techniques 

• Small/dead recording studios or large rooms with 
poor acoustics/missing early reflections 


• low resolution recording media 

• amplitude compression 

• improper use of dithering, cumulative digital 
Processing, and low-resolution digital processing 
(e.g., using single-precisión as opposed to double 
or higher-precisión) 

ln Summary: When recording, mixing and 
mastering—use the highest resolution technology, 
best miking techniques, and room acoustics. 
Process dead tracks with Haas delays and early 
reflections, and specialized ambience recoveiy 
tools. Then you’ll resurreet the missing depth in 
your recordings. 


1 Haas, Hcltnut (1951), Acústica. The original article is in Germán. Various 
English-speakingauthors have written their interpretations of Haas, which 
you can find in any decent textbook on audio recording techniques. 

2 Even if unnatural, it can be interesting. nevertheless. Listen to u)bo'&- r io'& era 
rock recordings from the Beatles, Beach Boys, Lovin' Spoonful.The Supremes, 
Tommy James and the Shondellg. and many more, where mono Instruments or 
vocals are panned to one side, and often their reverb return completely to the 
otherside. 

3 When add.ng Haas delays. listen closcly in mono, bccausc improper delay 
ratios can cause comb filtering in mono. A small degradation in mono may be 
tolerable if the improvement is signif.cant in stereo. Early reflections. due to 
their more complex naturc, are more compatible with mono foldcowns than 
simple Haas delays. 

4 Burroughs, Lou («974). Microphones: Dcsign and Applica tion. Sagamore 
PublishlngCompany. (Out of Prlnt). Burroughs quantlfled the efíeets of 
acoustic phase canccllation (comb filtering. interfercnce) with real 
microphones and real rooms, and devised this rule: The distancc between 
microphones should be three times the distance between each microphone 
and the soarcc of the sound to which it is being applied. This is particularly 
importantto avoid comb-filteringwhen both microphones are feedinga 
single channel: when the microphones are feedingdifferent channels 
(e.g. stereo). the degradation wtü be much less noticeable in stereo but still 
be a problem in mono. 


Chapter 17 320 



CHaPTer is 


High Sample 
Rates: Is This 
Where It’s At? 


I. lntroduction 


Now that we’ve cured the wordlength blues—it's 
time to tackle the sample rate issue. Whatever the 
eventual real benefits for the professional and the 
consumer, the current relentless drive for higher 
sample rates is certainly veryluerative for the 
hardware manufacturers. Clearly, engineers who 
must regularly replace their expensive high- 
resolution processors to keep up with the Joneses 
will spend big dollars. 

I’ve been working with higher sample rates for 
severalyears,’ but after some experiments that I will 
relate bclow, I have concludcd that most hardware 
design engineers are having trouble seeing the 
forest for the trees. I think that a fresh look at how 
A/Ds and D/'As are designed may reduce the need 
for extreme sample rates! 

A great number of engineers think that the 
reason higher sample rate recordings sound better 
is because they permit reproduction of extreme high 
frequencies. They point out the open, warm, extended 
sound of these recordings as evidence for this 
contention. 1 However, most objective evidence 
shows that higher bandwídth is not the reason for 
the superior reproduction; remember that the 
additional frequencies that are recordable by 
higher sample rates are inaudible. But if we can’t 
hear these frequencies, then why are we inventing 
expensive processors and wasting so much 
bandwddth and hard disc space? And how can 50- 
year-old ears detect diíferences between 44.1 kHz 
and 96 kHz and even 19a kHz sample rates, even 
though most ofus can’t hear much above 15 kHz? 


* I wats the reconling engineer for the wurldts firet yó bit audio-only DVD. 



I believe the answer lies in the design of 
digital low-pass filters, which are part of the 
requirements of digital audio. Digital filters are 
used in oversampling A/D and D/A converters 
and in sample rate converters. Digital filters 
employ complex mathematics, which is expensive to 
implement and so. cheaper filters have to inelude 
greater quality tradeoffs, such as lowered calculation 
resolution. ripple in the passband, or potential 
for aliasing. 

One type of filter has a sharp cutoff; the 
consequences of sharp filtering include time- 
smearingof the audio, possible short (millisecond) 
echos which are caused by amplitude response 
ripples in the passband frequeney response (20 Hz- 

20 kHz), even 
ripples as small as 
0.1 dB. Movingthe 
filter cutoff 
frequeney to 48 
kHz (for 96 kHz 
SR) relaxes the 
filtering 

requirement and makes it easier to engineer filters 
with less ripple in the passband and less phase shift 
near the upper frequeney limit. 

Oversampling 

One of the biggest improvements in digital 
audio technology carne in the late 8o’s, with the 
popularization of oversampling technology by DBX’s 
Bob Adams, in a high-quality, 128X oversampling 
18-bit oversampling A/D. An oversampling A/D 
converter has a front end which typically operates at 


The filters in a typical compact 
disc player or in the converter chips 
used in most oftoday's gear are 
mathematically compromised." 


64 or 128 times the base sample rate and produces 
í-bitto 5-bit words in delta-sigma formato 
depending on the model. In other words, for 44.1 
kHz operation, the input of a 128X converter actually 
operates at 5.6448 MHz! Oversampling takes the 
converter’s noise, spreads it around a wider 
frequeney spectrum. and shapes it. moving much of 
the noise above the audible frequeney range. In 
addition, when it is digitally downsampled to the 
base rate at the output of the converter, some of the 
higher frequeney noise is filtered out, to yield as 
much as 120 dB or even better signal-to-noise ratio 
within a 20 kHz bandwidth. 

The downsampling is accomplished with a 
digital Circuit called a decimator, which is a form of 
divider or sample rate converter, and which must 
contain a filter at half the sample rate to elimínate 
aliases, requiringa 22.05 kHz cutoff at a 44.1 kHz 
SR. This filter must be designed without 
compromise or it will affect the sound. Some 
manufacturers concéntrate on transient response, 
others on phase response, ripple, linearity, or 
freedom from aliasing. But all of these character- 
istics are important, and getting it right is 
expensive—precisión construction requires more 
math, and math requires labor and parts (size of the 
integrated Circuit die). Thus, the filters ina typical 
compact disc player or in the converter chips used 
in most of today’s gear are mathematically 
compromised. 

On the D/A (output) side, at low sample rates, 
sharp anti-imaging filters are required to retain 
frequeney response to 20 kHz. It is impractical 
(probably impossible) to build a sharp analog filter 


Chapteri8 222 


with the required characteristics, so instead an 
oversampling or upsampling digital filter 
multiplies the base sample rate up axto 8x or more, 
moving artifacts and distortion above the audible 
band. The higher sample rate permits using a 
gentle, uncompromised analog filter. But the typical 
digital filters used in the inexpensive chips have 
poor performance. To minimize the efíect of these 
concessions, the most progressive high-end D/A 
manufacturers add an additional upsampling filter 
of their own design. in front of the DAC chip. The 
additional filter reduces the error contribution of 
the chip’s own filter, in essence because the internal 
DAC's filter does not have to work as hard. Internally, 
these advaneed DACs are always operating at 88.a or 
96 kHz regardless of the incoming rate. At the 
double sampling rates, the supplementary filter is 
disabled. The supplementaiy filter would be 
unnecessary if the manufacturers of the converter 
chips used higher quality filters in the first place. 

An Upsampling Experience 

Audiophiles, and some professionals, have been 
experimenting with digital upsampling boxes which 
are placed in front of D/A converters. In some cases 
they report greatly improved sound. Although the 
improvement may be real, in my opinión they can 
be attributed to the various digital filter 
combinations, not to bandwidth or frequency 
response or (especially) the sample rate itself. 
Kemember that all original 44.1 kHz SR recordings 
are already filtered, so they cannot contain 
information above about 20 kHz. An upsampler 
cannot "manufacture” any new frequency 
information that wasn't there in the first place. 


I’ve compared the sound of upsamplers versus 
DACs working alone. Sometimes I hear an 
improvement, sometimes a degradation, sometimes 
the sound quality is the same either way. Sometimes 
the sound gets brighter despite a ruler-flat 
frequency response, which can probably be 
attributed to some form of phase or intermodu- 
lation distortion in the digital filter. Sonic 
differences have come down to mathematics in 
this new digital audio world. 

The Ultímate Lístening Test: Is ItThe 
Filtering or the Bandwidth?' 

In Decemberi996,1 performed a listening test, 
with the collaboration of members of the Pro Audio 
maillist. The idea was to develop a test that would 
elimínate all variables except bandwidth, with a 
constant sample rate, filter design, DAC, and 
constant jitter. The question we wanted to answer 
was this: Does high sample rate audio sound better 
because of increased bandwidth, or because ofless- 
intrusive filtering? 

The test we 
devised was to 
create a filtering 
program that takes 
a 96 kHz recording, 
and compare the 
effect on it of two 
different bandwidth filters. The volunteer design 
team consisted of Ernst Parth (filter code), Matthew 


"The issues of the audibility of bandwidth and the 
audibilityof artifacts caused bylimiting bandwidth 
musí be t reated separately. Blurring these issues can 
onlylead to endless arguments. ”-Bob Olhsson' 


* From the Mastering Engineer's Webboard. 

t 1 previously published some of this information in Audicmedia Magazinc; we 
publish the full story in this book. 


2High Sample Rates 




MYTH: 

Upsampling makes 
audio sound better 
by creating more 
points between the 
samples, so the 
waveform will be 
less jagged. 


Chapter 18 


Xavier Mora (shell), Rusty Scott (filter design), and 
Bob Katz (coordinator and beta tester). We created a 
digital audio filtering program with two impeccably- 
designed filters which are mathematically identical, 
except that one cuts off at 30 kHz and the other at 40 
kHz. The filters are double-precision dithered, FIR 
linear phase, 255-tap, with >nodb stopband 
attenuation, and <.01 dB passband ripple. 

After the filter program was designed, I took a 
96 kHz SR orchestral recording, filtered it and 
brought it baek into a Sonic Solutions DAW for the 
comparison. I expected to hear radical differences 
between the zu kHz and 40 kHz filtered material, 

But I could not! Next, I compared the 20 kHz 
filtered against "no filter” (of course, the material 
has already passed through two steep 48 kHz filters 
in the A/D/A). Again. I could not hear a difference! 
The intention was to listen double-blind; but even 
sighted, 10 additional listenerswho took part in the 
tests (one at a time) heard no difference between 
the 20 kHz digital filter and no filter. And if no one 
can hear a difference sighted, why proceed to a 
blind test? 

I tried different types of musical material, 
including a cióse -miked recording I made of 
castanets (which have considerable ultrasonic 
information), but there was still no audible 
difference. I then created a test which put 20 kHz 
filtered material into one channel of my Stax 
electrostatic headphones, and the time-aligned 
wide-bandwidth material into the other channel. I 
was not ahle to detect any image shiít, image 
widening or narrowing—there was always a perfect 


mono center at all frequencies in the headphones! 
This must be a pretty darn good filter! 

As a last resort, 1 went back to the list and asked 
maillist participant Robert Bristow Johnston to 
design a special "dirty" filterwith 0.5 dB ripple in 
the passband. Finally, with the dirty filter, I was able 
to hear a difference... this dirty filter added a boxy 
quality that resembles the sound of some of the 
cheaper 44.1 k CD players we all know. 

This 1996 test seems to show that a "perfect 20 
kHz filter” can be designed, but at what cost? Also 
note that as this test was conducted in the context of 
a 96 kHz sample rate, the artifacts of two other 48 
kHz steep filters already in use may have obscured 
or masked the effect of the filter under test. Since I 
conducted my test, several others have tried this 
filtering program, and most have reached the same 
conclusión: the filter is inaudible. One maillist 
participant, Eelco Grimm, a Netherlands-based 
writer and engineer, performed the test and 
reported that there were no audible differences 
using the Sonic Solutions system, yet he and a 
colleague were able to pick out differences between 
filtered and non-filtered blind using anAugan 
workstation. He did not compare the sound of the 
20 kHz versus 40 kHz filters, so we are not sure if 
he’s hearing the filter or the bandwidth, but I 
believe he was hearing the filter, which must not be 
ideally-designed. I believe the reason he did not 
hear the differences on the Sonic system is perhaps 
its jitter was high enough to mask the other 
differences, which must be very subtle indeed! 


224 




Regardless of whether Eelco’s group did reliably 
hear the bandwidth differences, it should be clear 
by now that the so-called "dramatic” differences 
people hear between sample rate Systems are not 
likely to be due to bandwidth, but probably to the 
filter design itself. Ironically, it was necessaiy to 
make a high sample rate recording in order to prove 
that high sample rates may not be necessary. 

As I mentioned, 44.1 kHz reproduction has 
improved considerably in recent DACs employing 
add-on high-quality upsamplingfilters. The next 
figure illustrates Weiss’s THD measurement of their 
SFC, showing that its filter has textbook pcrfect 
distortion and noise performance. 

Why can’t more manufacturers introduce filters 
of this quality into their converter chips? The 
evidence all indicates that it will be a lot less 
expensive for end-users if the manufacturers of 
converter chips upgrade the filtering software in 
their chip sets instead of directingus to this mad, 


1kHz Sine OdB Converted From 96kHz to 44.1kHz 

0 

-20 
-40 
-60 
•60 
-100 
-120 
-140 
-160 
-180 

0 6383 10787 18 t 60 21533 






















T 





















Tne distortion and noise performance ofa Weiss sample frequeney converter. 


expensive sample rate and format war. Objective 
experiments musí, be performed using state-of-the- 
art digital filters to determine what is the iowest 
practical sample rate which can be used without 
audible compromise. 

It’s A Matter of Time! 

Let’s be logical: since the human ear cannot 
liear above (nominally) 20 kHz, then anyartifaets 
we are hearing must be in the audible band. It is 

well-known that low-Q parametric and shelving 
filters sound better than high Q; it’s not a stretch to 
conclude this is also trae for low-pass filters. Audio 
rescarcher Jim Johnston, who knows as much about 
the time-domain response of the ear as anyone, has 
shown that steep low-pass filters create pre-echos 
which the ear interprets as a loss of transient 
response, obscuring the sharpness or clarity of the 
sound. 

The pre-echo length is the inverse of the 
transition bandwith, so a sharp filter with a 500 Hz 
transition would create a 2 ms. pre-echo. Steep 
filtering and its attendant transient degradation is 
probably a reason why 44.1 kHz SR sounds less clear 
than 96K. Likewise, the increased clarity and purity 
of i-bit recordings is probably due to their use of 
gentle filters rather than some mumbo-jumbo about 
the "magic” of i-bit. Jim has experimentally 
calculated that the minimum sample rate which 
would support a Nyquist filter gentle enough to 
elude the ear would be 50 kHz . 3 1 suggest that 
manufacturers and engineers must test as soon as 
possible the audibility of gentle low-pass filters, at 
the more common sample rate of 96 kHz. It would 

* In correspondence. JJ is the inventor of the Science of perceptual coding. vhich 
led to coding developments such as MP 3 , Atrae, etc. 


«5 High Sample Rates 





















be trivial to build a 96 kHz SR A/D/A System with 
the gentlest possible filter that’s flat at 20 kHz and 
removes aliasing at 48 kHz, but no current chip 
manufacturer has done so. This System can be 
compared against the analog source, and againsl the 
competingDSD recording system. íf the gentle- 
filtered PCM wins or sounds as good, it would be the 
triumph of psychoacoustic research over empirical 
design. Still, if it can be shown that good-sounding 
DSD at the consumer end is cheaper to implement 
than good-soundinggentle-filtered PCM 
reproduction, it is cheaper for us to record and 
process with gentle-filtered PCM and finally 
convert to DSD for the consumer (this is how most 
i-bit DACs opérate anyway). 

I firmly believe that some minimal sample rate 
(perhaps 96 kHz) will be all that is necessaiy if 
PCM-converters are redesigned with psychoa- 
coustically-correct filters (hopefully 
inexpensively). For the benefit of the myriads of 
consumers and professíonals, we need to make a 
cost-analysis of the whole picture instead of racing 
towards bankruptcy. 

The Advantages of Remastering 16/44.1 Recordings 
at Higher Rates 

Researchers such as J. Andrew Moorer of Sonic 
Solutions, and Mike Stoiy of dCS have demonstrated 
theoretical improvements from working at a higher 
sampling rate. Moorer pointed out that post- 
production processing, such as filtering, 
equalization. and compression, will result in less 
distortion in the audible band, as the errors are 
spread over twice the bandwidth—and hall' of that 


Chapter 18 226 


bandwidth is above 20 kHzA Measurements 
discussed in Chapter 16 confirmed these 
eonclusions. In addition, as we’ve seen above, if 
after processing the destination is DVD-Aor SACD, 
llien the master can be lefl at the higher sainple rate 
and wordlength, avoiding another generation of 
sound-veiling 16-bit dither and yet another sharp 
filter at the end of the process. Thus, consumers 
should not scoff at DVDs which have been digitally 
remastered from original i6-bit/44.iKsources. 
They will be getting real, audiophile-quality sonic 
valué intheir remasters. 


1 Othcr engineers who do not fully understand the nature of PCM argüe that the 
higher sampling rate sounds betterbecause it would secm to créate a more 
accuratc 20 kHz sinc wave, aa there are more "dota to conncct" to describe the 
wave. But this is erroneous; while there are more "dots.“ in redity only 2 
samples are necessary to describe an undistorted 20 kHz sine wave: the low- 
pass filtering smooths out the waveíorm and eliminates all the ¡^itches. 

2 DSD, aUo known as 1 bit or Direct Strcam Digital, a trademark of Sony and 
Philips is the format of the SACD ar.d employs a form of Delta-Sigma 
modulation. Delta'Sigma modulation is the very dense native coding format of 
the firststage of modem-day oversamplingconverters, about 3.8 Megabits per 
second. as opposed to 44.1 kHz/16-bit PCM, Pulse Code Modulation. which 

iuus at abuut 1.4 Megabits peí secuad. Wlicnyou sludy llie blutk diagram of a 

record-reproduce chain. the significant difference between using DSD format 
and PCM is that PCM requires a sterp Nyquist filter at half the sampling rate 
(about 20 kHz with 44.1 kHz SR). 

3 This is based on the length of the shortest organic filter in the human car, and 
J im Johtiston notes that the 50 kHz number nicely matches the original work 
with ant.afaising filters done by Tom Stockharn for the Soundstreatn project. 

4 Julián Dann (in correspondence) clarifies: A 3 d B reduction in distortion 
results hecause the error produets are spread amongst twice the bandwidth. 
This is truc for uncorrelatcd quantization errors which fall evcnly throughout 
the frequeney range from de to fs/2. And does not work for distortion produets 
which w.U correlate with the signal. Jira Johnston (in correspondence) 
indicates that processing at higher rates is required for any non - linear 
processing. such as compression. These non linear proccsses produce new 
frequeney components. some at higher frequencies. A high enough sampling 
rate avoids aliasing of these new frequeney components (see Cranesong and 
Weiss FFTs in Chapter 16). 



CHaPTer 1 9 


Jitter- 
Separating the 
Mjths from 
the Mjsteries 


I. introduction 


One of the least-understood, and hardest-to- 
explain phenomena in digital audio is jitter. To truly 
understand the influence of jitter onyour digital 
recordings, you will have to rejectyears of analog 
experienee. In a classic Marx Brothers movie, 
Groucho’s girlfriend catches him ernbracing 
another beautiful woman. In defense, Groucho 
quips, "Are you goingto believe me, or your own 
eyes?” Let me apply this to audio and ask, "Are you 
goingto believe the facts, or your own ears?” Forin 
this topsy-turvy digital audio world, sometimes you 
have to abandon the evidence of your senses and 
learn a totally new sense, but one that is fortunately 
based on well-established physical principies. 

In 19O0, because most sound systems, A/D and 
D/A converters and processors had sueh low 
resolution, jitter errors were far down on the 
priority list. Nonlinearity, noise modulation, 
truncation, improper dithering, aliasing and other 
errors created audible problems that tended to 
swamp the effects of jitter. But today, where audio 
performance frequently reaches 20-bit levef and 
sometimes exceeds it, jitter has reared its ugly head. 
The symptoms of jitter mimie the symptoms of 
other comerter problems— hlurred, unfocnsed, 
harsh sound, reduced image stabilily, loss of depth, 
ambience, stereo image, soundstage, and space— 
though usually in a subtle way, and it can take time 
for even a critical ear to learn to identify them. 

What causes these problems? Is our digital 
audio actually being affected by jitter in our clocks? 

• Rarcly docs typical equipment cxceed no-bit performance, as wc shall soon see. 


2*7 



The simple answeris: Sometimes yes, mostlyno! 
Should we believe our ears? It’ll take a whole 
chapter to sort this one out. 

II. What is Jitter? 

Digital audio is based upon the concept of 
sampling at regular intervals. To keep those intervals 
constant requires a consistent clock. If the frequency 
of the clockvaries during A/D conversión, then 
because the waveform will be at the wrong amplitude 
at the wrong place when the digital audio is played 
back, the audio will be permanently distorted. 



Jitter During A/D Conversión Creates Permanent Distortion 



l ili!-1 I I I I I \ « I I 

Reproduced with a Clean Clock 



MyTH: 

Jitter reduction 
units improve the 
sound of digital 
processors. 

I_I 


That’s why it is critical to have a consistent clock 
during A/D conversión. Similarly, an inconsistent 
clock will yield distortion during D/A conversión. 

We cali this inconsistency jitter. One period of a 44.1 
kHz clock is 33.7 ps.’ Amazingly, variations in the 
duration of that period as short as 10 picoseconds 
may cause audible artifacts. depending on the 
quality of the reproduction System andyour own 
hearing acuity. As sample rate increases and 


wordlength lengthens, jitter must be proportionally 
lower to maintain sound quality, because jitter 
affects the absolute noise floor. Jitter produces 
sidebands (additional frequencies, or tones) that 
mask inner detail in a recording. 

We can measure jitter in two places: 

1) interface jitter, the jitter present in the 
interconnections between equipment, or 
■2) sampling jitter, the jitter in the clock which 
drives the converter. And we can measure the 
effects of jitter on converters, usingspecial analog 
and digital test signáis. If a converter has excellent 
internal jitter rejection, then high interface jitter 
may not result in sampling jitter. In other words, 
you can have a jitteiy interface or cable, and it won’t 
matter a bit to a well-designed converter. In this 
chapter, we are mostly concerned with sampling 
jitter, because, as we shall see, interface jitter is 
rarely important unless it causes a breakdown in 
communication between devices. 



In the figure above, you can see that it is up to 
the PLL T inside the DAC to create the sampling 


* One microsecond (ps) is one millionth of a second. One picosecond is one 
millionthof one millionth of one second, or io' 12 second. 
t PLL is a Phase Locked Loop. Its operation is explained later in the chapter. 


Chapter 19 ¡¡38 






























dock. If it is a superb (veiy rare) PLL, then none of 
the artifacts of incoming interface jitter will be 
transmitted to the sampling dock. 

III. jitter, When ¡t Matters, 

When it Doesn’t 

If leapingto conclusions about jitter were an 
Olympic event. sound engineers would win the gold 
medal. An entire audiophile subculture has 
developed around digital cables and jitter reduction 
units in an attempt to achieve better reproduction, 
which has led engineers to change cables eveiy- 
where they hear that such a replacement makes a 
difference, or to experiment with "stable” external 
docks, each of which produces a different sound.* I 
don’t blame them for tiying, but in general, cables 
and wordclock generators are onlv bandaids for 
jitter problems which must ultimately be solved 
within the converters. No cable can remove the 
inherent jitter problems in the AES/EBU-SPDIF 
interface, because the imbedded dock interacts 
with the data stream. Thus, external jitter reduction 
units will always be limited in their effectiveness 
because jitter may be increased at the output 
interface between the jitter reducer and the D/A. 

Since engineers hear improvements with the 
better cables* 9 (and jitter reduction units) feeding 
their D/As, they conclude these same cables will 
improve their digital audio processors. But this is 
(largely) a misconception. 1 Rememher: audio 
processors process data, not eloclc. If they hear a 
difference, it is because a cleaner dock is passed to 


' leading to the “Wordclock Du Jour“ effect, as we shall see. And an erroneous 
ludiophilt* maga/.ine DAC review marvellingat a DAC “revealing" cable differcnces! 
Shortly we’Il describe the infinitesimal number oí cxceptions. 


the D/A converter; but there is no difference in the 
data beingprocessed. Believe the faets, not your 
own ears! The listening problem is ephemeral, and 
also has an immediate solution—get a hetter DAC! 

How to Lie With Measurements 

Clock jitter can produce insidious audio 
artifacts in converters. Most manufacturéis 
specifications hide these artifacts because we have 
not yet established a measurement standard for the 
effeets of jitter on converters. For example, some 
recent A/D (and a few D/A) converters now report 
exceptional >i?o dB 
signal-to-noise 
ratios, theoretically 
equivalent to >20- 
bit performance, 
but is this trae in 
practice? These 
figures are obtained 
by the traditional method of calculating signal-to- 
noise ratios: first measuringa full-scale signal, then 
removing the signal and measuring the residual 
analog noisc. But this docs not takc into account 
additional noise (or distortion) when the signal is 
present. As far as I’m eoncerned, traditional audio 
signal-to-noise ratio measurements have (almost) no 
relationship to the sound of a converter when it is 
receiving signal. It is this which accounts for some of 
the previously-unexplained sonic differences 
between converters. Most signal-to-noise ratio 
measurements quoted in manuals are therefore 
irrelevant, and most people have never heard trae 
20-bit performance, let alone 24. 


Traditional audio signal-to-noise 
ratio measurements have (almost) 
no relationship to the sound of a 
converter when it is receiving signal. ” 


329 Jitter 



Digital Print-Through. Ideally, the converter's 
PLLshould completely reject incoming jitter with 
its clock-smoothing Circuit, but if the PLL has 
inadequate jitter attenuation, it will pass some of the 
interface jitter to the critical conversión dock. The 
most egregious-soundingtype of uneliminated 
jitter is signal-dependent jitter, caused bythe 
designe of external interfaces such as AES/EBU and 
SPDIF. Although signal-dependent jitter is 
analogous to analogtape flutter, it is veiy much like 
analog tape print-through because it is signal 
dependent and adds a blurred quality to the sound. 
Around 1975, analogtape manufacture!' BASF 
demonstrated that an analogtape with lower print- 
through can sound cleaner and quieter than a tape 
with lower hiss level and higher print-through. 1 


'Most digital processors are 
completely immune to jitter 


Similarly, a converter which successfully rejects 
jitter can sound much cleaner than another with a 

lower absolute 
noise floor. Talk 
about lying with 
statistics! Jitter can 
produce signal- 
dependent effects 
(which yield 
distortion from intermodulation between the 
sample rate and the audio signal), random effects 
(which translates to a higher random noise floor 
which can also be signal-dependent). and discrete 
írequency effects (such as other clocks in the box 
producing random tones and ínter-modulation 
between the other clocks and the main sampling 
clock). Some of these effects are more benign to the 


ear than others, which is why it is so difficult to put a 
single meaningful numberon jitter. 

Storage Media 

There is no jitter on o storage médium— only the 
data is stored, not the clock. Likewise, there is no 
clock on a compact disc. A new clock is generated on 
playback, and thus jitter comes into play only when 
data is clocked out of the médium. Bits are usually 
stored in a very irregular fashion; on hard dises, the 
data may be out of order, non-contiguous, and 
widely spread. Data stored on CD (in EFM format) 
rnust be unscrambled and decoded during playback, 
and DAT data is stored in separated blocks, but none 
of these storage formats can be called jitter. since 
time is not involved until the data is played back. So, 
if you're looking for the causes of playback jitter, 
you have to study the complete mechanism. 

During playback, the amount of clock jitter on 
the output device is determined by the quality of the 
servo, buffering. and clocking circuitry that drives 
the data. Manufacturen differ widely in their 
abilities to keep outgoing clocks under control and 
clock stability is simply not important to the 
original Computer-based technology that we have 
now adapted to digital audio. In fact, the standard 
Computer hard disc interfaces (e.g., SCSI, IDE) are 
asynchronous (non-clocked), they have a completely 
irregular output. The equivalent jitter of a SCSI 
interface is enormous, for at one moment, there 
may be no data: at another moment, it’s streaming 
at many times real time. When such non-clocked 
interfaces are used, it is the duty of following 
circuitiy to make the data conform to a steady clock. 


s 3 o 


Chapter 19 


Digital Mixing and Processing 
Jitter does not affect the data... 

... when you are performing an all- digital raix in 
most digital consoles. After the initial analog-to- 
digital conversión (we hope with a low-jitter dock) 
the data can pass from processor to processor, from 
médium to médium regardless of dock jitter—just 
as long as the interface jitter is low enough to allow 
an error-free transfer. Similarly, dock jitter has no 
effect on the performance of most outboard digital 
equalizers, limiters, or compressors, which are 
nearly all State machines . A State machine is defined 
as any type of processor which produces identical 
output for the same input data, and which does not 
look at data timing or speed, bul only at the State or 
recent history of the data. In otherwords, most 
digital processors are completely iramune to 
jitter. With a State machine, you could make the 
dock completely irregular, or even slow it down to i 
sample per second, and eventually, the processor 
would output all the correct data words. When these 
words were played back at the right speed and with a 
clean dock, all would be well. 

All current professional oversampling 
processors—such as equalizers and compressors— 
are State machines. They use synchronous 
converters to double the internal sample rate, and 
since synchronous converters are State machines, 
the same rule applies. 2 Any State machine can be 
implemented offline in a Computer and without a 
clock, where real time and jitter have no meaning. 

With digital pitch processors such as 
Autotune™ the explanation is a bit confusing. but 


these are notaffected by jitter. Pitch processors are 
not state machines; due to their randomizing 
algorithms many of these repitching processors 
produce a different 
output from the 
same piece of 
music each time 
they are run. But 
they do look at each 
sample coming in, 
one at a time, 

regardless of the regularity of the clock íeeding the 
box. As you know, these repitchers can run offline 
in a DAW, at any speed, without a clock. 3 


Don ’t confuse the messenger with 
the message —Anuy Moorer 


Jitter affects the monitoring 

Jitter usually becomes meaningful in a digital 
mix only during monitoring, when the data is 
clocked out of a D/A converter. This is where 
everyone gets hopelessly confused, like the 
girlfriend who caught Groucho Marx during his 
hijinks (he probably was guilty anyway). Let me 
emphasize: high jitter during the monitoring seems 
to affect the overall sound quality, but it really only 
affects that individual listeningexperience, and has 
no effect on the data. Don’t confuse the messenger 
with the message A The message (the data) 
remains intaet; so if it sounds funny, blame the 
messenger (the clock inside the monitor DAC). 
This is what I cali "ephemeral jitter.” If you improve 
your connections and the sound gets better, this 
does not mean that the digital equalizers are 
suddenly performing better—it only means that a 
cleaner clock is getting to the D/A converter. 


i 3 1 Jitter 


Jitter affects the data during a digital mix only... 

• when you leave the digital realm to use outboard 
analog processors, henee superior converters and 
clocking must be used for outboard equipment 
feeds. 

• Some digital consoles contain asynchronous 
sample-rate converters (ASRC). These types of 
SRCs use variable filters based on a continuously- 
running estímate of the incoming sample rate, 
and thus are sensitive to clock jitter. An 
asynchronous SRC is not a State machine and will 
produce a different output each time it is run. You 
should question the quality of any ASRC, and try 
to deliver to it the highest quality clock. This is a 
serious issue, especially in low-cost consoles, 
where docks are often compromised for 
economy, and especially sinee the consolé is 
lypically driven by an external (word) clock, which 
puts the burden of low-jitter on a cheap PLL 
inside the consolé. I am not a fan of consoles that 
contain ASRCs, unless they can be completely 
bypassed when not needed. Modern-day ASRC 
chips contain sophisticated jitter-reduction 
algorithms and have relatively low distortion, so 
consolé performance is slightly degraded. The 
audible effect is a slight veiling or diminishing of 
stereo image stability, to my ears, about 90% of 
the original sound quality. Can we accept 90%? 

Til leave it to you to decide. 

Analog Mixing 

Clearly, jitter matters anvtime a conversión 
takes place. Thus, when mixing with an analog 
consolé and digital multitrack, jitter is extremely 
critical. In contrast to the advice given by manufac- 


turers of word-clock distribution devices I 
recommend that mix engineers try runningthe 
multitrack or D/A converters on intemal clock; it 
may sound better. The manufacturers of outboard 
clocking boxes are tiying to sell you equipment 
which in ail cases is a handaid and not a cure—so 
investígate, inspect the measurements and test 
before you buy. In an ideal world, the converter 
should handle any reasonable clock feed or cable 
interface without aífecting the sound, and there are 
now a handful of converters that meet that 
requirement. Authoritative measurements and good 
subjective tests are hard to come by, so cherish the 
magazine article or book that provides good 
information on the jitter performance of your 
favorite converter. Be aware that it is a lot easier to 
design a stable crystal clock than a PLL, which has to 
perform double-duty as an oscillator and reject 
incoming jitter. 5 Thus, any reasonably-designed 
converter or multitrack recorder can perform better 
on internal clock, and in a superior converter, the 
performance on external clock can only do as well as 
internal, but not better. If a converter does better 
on external, this should be seen as a criticism of the 
quality of the internal clock. 

However, outboard word docks are useful for 
syncing non-conversion processes and in a perfect 
world should be used to drive anythingbut 
converters! I know this goes against the common 
"wisdom’’ but it does not contradict the basic 
principies of digital audio design. Later in this 
chapter, we present some measurements to help 
guide you, measurements you can duplícate with 
readily-available equipment and test signáis. It’s 


Chapter 19 ^ 3 s 


amazing how few manufacturers take advantage of 
these simple measurement techniques, or perhaps 
they're too embarrassed to publish the data. 

Clock Stabiüty Requirements for Converters 

An ordinarv ciystal oscillator is sufficient for a 
Computer that processes data, but audio converters 
require an extraordinarily stable master oscillator. 
To get 20-bit performance at 44.1 kHz SR requires 
oscillator stabiüty (jitter) at or below 25 
picoseconds peakto peak. 6 One nanosecond (1000 
picoseconds) in the time domain equates to 1 GHz, 
which is why a critical converters circuitiy must be 
shielded and isolated from even the tiniest RFI or 
clock leakage that can enter via power supply, 
grounds, or emissions. Now it should be obvious 
why good-sounding converters are rare and 
expensive. 

The Effects of Internal Sync vs. Various Externa! Sync 
Methods on Converter Performance 

There are two ways to clock a converter: 

a) via Infernal Sync, where a stable crystal clock 
located inside the converter directly drives the 
circuitry. In an excellent design, a crystal clock 
located veiy cióse to the sampling clock pin of the 
converter chip will yield the best audio performance. 

b) Extemal Sync, which usually requires a phase- 
lockcd loop (PLL), a critical and cantankcrous 
circuit, the fundamental culprit of jitter- induced 
converter artifacts. The PLL has to filter jitter 
caused by poor source docks and by interference 
alongthe cable which brings in the clock. Thus, the 
common use of unbalanced wordclock cables can 
produce ground loops in the clock signal itself. 


Examples of Extemal Sync: 

i) AES/EBU sync, which is prone to signal-related 
jitter, as first illustrated by Chris Dunn and Malcolm 
Hawksford in their seminal AES Journal paper." 
Thus AES/EBU "black" will produce a cleaner clock 
than AES/EBU with signal, with a typical PLL. But a 
"smart” PLL will not produce signal-related jitter, 
also known as program-modulated jitter or data- 
dependent jitter. 

ii) Wordclock sync, which can yield extremely low 
jitter, because the PLL required is simpler. Despite 
this, only a handful of the converters I’ve tested 
have inaudible degradation due to jitter under 
wordclock, and even fewer under AES/EBU! This 
means thatyou may have to re-evalúate your current 
converter choice ifyou wantto obtain audiophile 
performance when lockingto video. 

iii) Superclock sync. which may or may not require 
a PLL, depending on the frequency of the 
superclock and design of the converter receiving it. 
There is no such thing as a free lunch, and manufac¬ 
turers must still pay attention to jitter issues with 
superclock. 

iv) Other Interfaces. The Computer industry is 
continually reinventing the wheel, and the audio 
industry is about to adopt the latest wheel, a very 
jittery Computer interface commonly known as 
Firewire 01MLAN. For lower jitter, a suppleineulary 
wordclock or internal sync cable will be required. 
Let Firewire carry the data, but not the clock. I 
predict sound-quality will initially go downhill 
when Firewire takes over, until manufacturers pay 
better attention to jitter issues. 

• Dunn. Chris & Hawksford. Malcolm. Is The AES/EBU/SPDIF digital audio 
interface flawed ?/oumaí of the AES preprint 336 o October 1993. 


z 33 Jitter 



IV. How to Get the Best Performance 
from Converters 

A/D-Jitter Permanently Affectsthe Recording 

In 1988, all available A/D converters left me 
coid, so I built the world’s first working implemen- 
tation of Bob Adam’s DBX oversamplingteehnology, 
later purchased and refined by Ultra Analog. A few 
engineers latched onto this technology, and in my 
opinión, the quality of custom-built Ultra Analog 
converters was unbeaten for almost 10 years, when 
finally, a fewhigh-end professional A/Ds arrived 
that soundedas good, and eventually, better. I 
always opérate well-designed A/Ds on internal sync 
for best performance, unless doing video, when they 
must be locked externally." The A/D should be the 
master dock in any system when recording, and the 
D/A the master when playing back. Remember, 
jitter in an A/D translates to distortion which can 
never be removed. 

D/A-Low jitter Important for the Listening 

Until recently, professional D/A converters also 
left me coid, and for over 10 years, I resorted to 
using customized consumer (audiophile) units that 
exhibited, to my ears, superior depth, space and 
tonality. However, while professional units were 
slowly advancing in terms of jitter-immunity, most 
consumer and audiophile units - which were never 
meant to reject the high jitter levels encountered in 
a complex digital recording studio - were not. So I 
had to suffer from inconsistent sound depending on 
the source feedingthe D/A converter. Only recently 
have a few professional DACs appeared with both 
good-sounding analog circuitry and virtual 
immunity to incoming jitter. 


In the vear 2000 ,1 installed a new converter 
into our mastering suite whose key to low jitter 
performance is having all converters opérate from a 
common bus master dock, so there is no longer the 
question of switching dock when recording or 
playing back. The source of the bus dock can be an 
internal oscillator, AES/EBU, or wordclock. 8 There 
must be only one master dock in a system at any 
time. Eveiy playback device (e.g.. DAT, GD) must 
either slave to that dock, or must become the 
master. This raises a fundamental question of 
technique. How do you put the master dock where it 
belongs (inside the converters), and still be able to 
play back DATs and CDs? The solution is to use 
professional-quality transports that have external 
wordclock connection.s. 

I have tried all clocking possibilities with this 
new converter, which is highly immune to jitter on 
all its interfaces. Yet I heard and measured a slight 
improvement with each enhancement in clocking, 
with internal dock performing better than WC and 
much better than AES/EBU (as theory would 
predict). When I installed a CD transport that would 
slave externally, it was such a very pleasant surprise 
to hear CDs sounding better than ever that I took a 
pleasureful day off to enjoy to some of my favo rite 
music before going back to work! Jitter 
measurements seem to confirm these results, and 
lead to the conclusión that jitter artifacts must be 
near the noise floor to become inaudible. 

So, what does it take to make a superior 
converter that produces only inaudible effects from 
jitter? The answer is time, research, and critical 


Chapteri9 


design implementation. The engineers who 
produced this superior converter spent one man- 
year on the phase locked loop alone, and a further 
year on the converter details. Successful converter 
manufacturers must master the techniques of PC 
board layout, grounding, internal dock distrihution, 
and immaculate separation of digital and analog 
signáis. Things are lookingup. But caveat emptor. 

Do we need to worry about cables, which 
produce sonic differences with jitter-susceptible 
converters? When 1 was using a jitter-susceptible 
converter, I spent a long effort cleaning up cable 
runs, using proper-impedance cable, avoiding 
ground loops, etc., and this resulted in improved 
monitoring. But really, you can mismatch 
impedances (e.g. no ohra to 75 ohm) with no 
concerns that jitter will affect the data. However, at 
high sample rates, impedance mismatches are more 
likely to cause poor signal transmission (and 
incidentally, high interface jitter), resulting in 
glitches or dropouts, so it’s wise to getyour cabling 
act together. Balanced digital connections can also 
reduce RF radiation into sensitive analog stages, and 
improve the performance of jitter-sensitive 
converters. 

The Internet and Jitter 

As studios begin to collaborate through the 
Internet, jitter issues will be even more challenging, 
since DSL and Ti Unes are notoriously jittery. 
Perhaps it may be possible to use a master dock 
based on a CBS satelüte dock, provide.d that a GRS- 
derived dock can drive a converter with the 
required low jitter. Or perhaps the solution will be 


to install an elastic buffer where the 
Internet sources enter the building. 

V. Stop Leaping to 
Condusions: Real World 
Examples 

Let's apply some of the 
principies we’ve discussed. The 
ñames have been changed to protect 
the misinformed! 

Example A: Digital Copying and Jitter 
Reduction. 

Engineer Betty would like to do 
some Digital Copying (cloning), 
from CD to DAT. First she notices 
that her CD recorder sounds better 
than her DAT machine. The reason 
is that the internal docks of typical 
DAT machines are not as clean as 
those in CD players (perhaps 
because tliey have more mu tura to 
interfere with the electronics). 9 But 
mostly she’s concerned about the 
sound differences she hears-, her 
DAT machine sounds better on playback than on 
record! She tries inserting a "jitter reduction unit” 
before the DAT, hopingto make better dubs, but this 
only creates more puzzles—now it sounds better 
during dubbingthan when it is played back! What is 
going on here? 


The Mirade of the 
Blessed DAT Resurrection 

Always wanting to see how far equipment can 
be pushed, I decided to demónstrate the 
veracity of the laws of physics at an AES 
Convention around 1992.1 built a special dock 
osclllator whose jitter could be altered, and 
connected it to a DAT machine, thus 
simulating conditions sich as mismatched 
cable impedances or extreme problems with 
clocking. The digital out of this machine wa: 
then connected to another DAT machine to 
make a dub. While monitoring the record 
machine in E-E, I increased the jitter of the 
source machine until the record machine 
exhibited serious distartion on its analog 
output; it sounded like a vocoder in overload. 
Anyone listening to this machine’s output 
would conclude it was broken and that it was 
making a detective reccrding. But believe the 
facts, notyour own ears, because on playback, 
there was no trace of the distortion; the 
playback sounded very clean. Thus 
demonstrating that dig tal dubs are not 
susceptible to jitter. ’ 


• E-E is a tcrni commonly used in video meaning "Electronics To Electronics." 

when a machine is in record monitor mode as opposed to playback. 
t Or you may choose to conclude from this example that DAT dubbmg vía 

AES/EBU may only be susceptible to jitter in the most subtle or imperceptible 
way. which 1 do not believe to be the case. 


335 Jitter 





Digital copies really are perfect (as long as the 
playback deck is in good condition and not interpo- 
latingdigital errors). Illustrated below, the DAT 
machine drives its DAC fromtwo choices of dock; 
during record it depends on the phase locked loop 
to generate a dock from incoming dock, and during 
playback it uses its internal oscillator. The reason 
the DAT machine sounds better on playback is that 
its internal dock is probably more slable than its 
PLL. IIowever, the message isn’t changing, only 
the messenger deliveringit. And when Betty 
inserís the jitter reduction unit, it’s no surprise that 
record mode now sounds better than playback— 
since DAT machines typically are built to a price, a 

$2000 jitter 
reduction unit 
helps the PLL 
produce a cleaner 
dock than the 
machine’s uwn 25 
cent oscillator! 10 

But Betty’s 
jitter reduction 
unit does not 
improvc the copy in any way (though you won’t hear 
this story from the manufacturers of jitter- 
reduction units).* She can prove that there is no 
problem with the DAT by listening under identical 
jitter conditions. For example, she can copy the DAT 
back to the CDR and play the two CDR tracks back to 
back. What conclusión must she draw about the DAT 
tape Vi t'he two l/l/R tracks sound iaeníicaiT 



A consumer-model DAT machine 


swítcnes its master dock sourec 
between record and playback mede. 


• The author has a collection of surplus highend jitter-reduction units in his 
garage, available at bargain pnces. 


Example B: Copying via SDIF-2 versus AES/EBU 

Engineer Don has concluded that DASH 
recorders make cleaner digital copies through the 
SDIF-2 interlace than through AES/EBU, because 
he knows that SDIF-2 is a "cleaner” interface. His 
experience has been that the SDIF-2 interface 
makes a DAC sound better via its sepárate, clean 
word dock, while AES/EBU embeds a (jittery) dock 
in the data stream. And since the DASH tape 
recorder sounds better to Don, he concludes that 
the DASH tape copy is better than the DAT tape 
copy. But it is equally feasible that it’s the DASH 
machine itself that "sounds better,” not the tape. 
Both the DAT and DASH tape make equivalent 
masters, except the DASH tape uses more robust 
error correction and will probably last longer. 

Don can prove his own conclusión to be false by 
taking the "questionable” DAT copy and playing it 
on a DAT machine equipped with the SDIF-2 
interface, preferably slaving the DAT to wordclock. 
He’ll probably find the DAT copy now sounds as 
good as the DASH. Regardless, Don should also 
invest inone of the new jitter-immune DACs, which 
can make the SDIF-2 interface unnecessary. 

Example C: Clock Accuracy? 

Ray was told that an accurate crystal wordclock 
fed to all of his gear would make it sound better. The 
operative word here is not accuracy but rather 
stability. For jitter removal, stability counts more 
than absolute accuracy. A crystal may produce 
44,100 Hz on the average, but a jittery crystal 
oscillator deviates above and below that average. In a 
lolally digital produclion studiu, even if the master 


Chapter 19 


2 36 

















ciystal is several Hertz off, and even if that causes an 
audible pitch error, the end result will sound correct 
xvhen reproduced with a correct crystal. If I’m in a 
hurry, I can speed up my dock to 48 kHz, or even 
faster if the equipment suppons it, and still make a 
valid dub at high speed. This illustrates the fact that 
jitter cannot influence the accuracy oí a dub: we can 
speed up the source to a frequency 10,000 times 
greater than the frequency deviation due to jitter 
and still make a perfect data copy! Dubbing is done 
on a sample by sample basis; the job of the dock is 
simply to deliver succeeding samples into the 
queue. 11 

Example D: Mixing down via internal or 
external sync? 

A recent magazine article purported to evalúate 
the "sound" of wordclocks. But wordclocks have no 
"sound;” what counts is the ability of the converter 
to reject jitter on the incoming wordclock and pass a 
deán dock on for conversión. Engineer Fred says 
his multitrack sounds much better with a new 
wordclock generator than with the oíd one. I don’t 
doubt it, but Fred should investígate putting his 
multitrack on internal dock, which is a lot easier to 
design well than a PLL. Note that if Fred is 
performing an analog mixdown, he can run his 
mixdown A/D on its own (independent) internal 
sync, and get the best of both worlds. 

Example £: Load-in jitters? 

Engineer Jeff thinks that digital load-ins made 
throughhis DAW’s S/PIF input sound better than 
those made through its jitteryToslink optical input. 
But he’s mistaken, the sonic difference is 
ephemeral. It will only be present duringthe loadin, 


and the DAW’s playback will actually sound better 
than the loadin! And J eff will only notice this if he 
uses an inferior DAC which is susceptible to 
differences in clocking. Rest assured that interface 
jitter or clocking differences have no effect on the 
integrity of a digital load-in from a digital source. 

In summaiy, we should not blame docks for 
problems that should be fixed in the converter. And 
we should stop working on minimizing jitter inthe 
digital processing chain, instead concéntrate on 
ways to reduce the jitter at the sampling dock inside 
the converters. 

VI. Concern for the rest of the world... 

Since the whole world is probably listening to 
music on inferior D/A converters, it’s veiy 
importan! that the CDs (and DVDs. or SACDs) we 
cut for them have the best possible sound. As I said, 
there is no jitter on a storage médium, butthere is 
some (controversial) evidence that CDs cut at high 
speeds sound inferior to CDs cut at low speeds, and 
that CDs cut with a jittery dock sound worse than 
those cut with a clean dock. 13 We theorize that 
certain mechanical parameters of the disc are 
altered by the cutting speed, making it more 
difficult for the CD player's servo mechanism, 
passing the varying servo load to the CD player’s 
power supply and thus affectingthe stability of the 
master dock. It only takes a few picoseconds to 
make an audible difference. Regardless of the 
theoretical reasons why this might be happening, 
it’s important to note that the CD difference is an 
ephemeral and correctable phenomenon, clearly 
related to some difficulty of the CD player only 


Jitter 


Start of Channcl A 
preamblc 



N^Start of vvordclock 
down 

28% timing error 


This oscilloscope photo compares 
the timing of the start of the 
Lhannel A A65 preamble against 
the start of wordclock at the 
outputofa digital processor. This 
timing offset of28% ofthe length 
of the AES frame is 3 points 
greate r than the permissible 
tolerance in standard AES11 and 
would :ause locking trouble to 
in toleran t consoles or DAhls or 
other receivers. 


during playback, and that the data itself is 
unchanged. The differences are no longer audible 
when played over a jitter-immune DAC. Because 
there is no permanent distortion in a D-D dub (as 
would be the case in A/D conversions) the output of 
the CD player can be reclocked to make the apparent 
audible differences inaudible. Time and again I 
have observad that when the clocking has been 
fixed, formerly audible differences disappear. 
However, until eveiyone else has perfect D/As, it’s 
important for the CD production 
plants to heed the audible 
evidence, and cut glass masters at 
íx speed and find other ways to 
make the best-soundingCDs. 

By the way, for those listeners 
with inferior DACs (the majority), 
I always find that I can restore the 
sound quality of an "inferior" CD 
by copying it back to a workstation 
and then outputting on a good 
SCSI writer at íx speed. In this 
case the dub does sound better 
than the original! It is technically 
impossible for previous jitter to 
be passed through an 

asynchronous interface such as SCSI to the final 
sampling dock. 

Vil. Things That Go Bump In The Night 

Framing and Timing Errors 
Wordclock to AES timing error 

Although jitter is often made the scapegoat for a 
motley of problems in digital audio the fací is that 


99% of the time, glitches, clicks, dropouts, noises 
and lockup problems, are caused by framing 
problems. not by jitter at all. Framing problems are 
caused by timing differences in critical signáis and 
cannot be solved without equipment software or 
hardware modifications. At left is an oscilloscope 
photo, at the top of which is the start of the AES 
preamble (which defines the beginning of the AES 
data word), and on the bottom, the point where 
wordclock changes from high to low. 

To complicate matters, there is no standard that 
defines which wordclock transition (low to high or 
high to low) the AES preamble should line up with. 
This is a timing difference of 180 degrees, or 
approximately 11 pS at 44.1 kHz, which is enoughto 
drive workstations, processors and consoles batty, 
producing glitches, or no signal at all. Fortunately, 
my workstation has a menú choice that allows us to 
choose the wordclock phase, making it more 
compatible with producís of various manufacturers. 

AES to AES framing error 

Digital audio is a small industry, still experi- 
encinggrowingpains. Many current digital consoles 
and DAWs are oversensitive to timing problems, 
which 1 must stress are unrelated to jitter. And since 
some digital audio processors produce an AES 
output that is out of timing with their AES input, 
intolerant consoles and workstations have trouble 
locking to them (illustrated at right). Once I was 
forced to inserí a simple reverb unit via analog, 
because the digital consolé would not lock to it on a 
digital send/return path. The fault was caused by the 
console’s intolerance to AES framing errors. 


238 


Chapter 19 






aggraratcd by the reverb unit’s output being slightly 
out of framing (timing). as seen in the following 
figure. You can probably prove it's a framing 
problem without measurement equipment: in this 
situation, set the digital processor to run on its 
internal dock, and lock the consolé to the external 
processor on its reverb return. If the consolé will 
lock and pass audio from the external processor, 
then the previous problem was a framing problem. 

Lockingthe consolé to wordclockwould 
probably not help and may even worsen the 
situation since the timing difference between the 
AES sources would remain. Framing errors are 
cumulative in a chain of processors if they are 



How AES to AES framing error can cause locking problems 


chained via AES/EBU (or S/PDIF). If the framing 
error of each box is in the same direction, then the 
total error could be enough to cause locking 
problems in sensitive consoles and DAWs. You may 
be able to stabilize the system by locking the last 
processor in line to external sync (wordclock or 
AES). If the last processor in line is framing- 


tolerant on its AES input, then locking it to external 
sync will forcé its output to a known framing and 
hopefully to within the tolerance of the DAW. It’s 
also possible to build an outboard box that will fix 
this sort of framing problem, but really, the burden 
is on the manufacturers to produce consoles and 
processors that are within the AES standard 
tolerances.* 3 Again: Caveat emptor.’ 4 

Off-Center Clocks 

Another problem mentioned earlier is loss of 
lock caused by an off-frequency master ciystal and a 
sensitive PLL. Some digital inputs have a very low 
tolerance to incorrect center frequency (which also 
makes them uncomfortable when varispeeding). If 
you have locking problems not due to framing 
errors, confirm that the source frequency (e.g., 44.1 
kHz) is correct, and if not, have the master ciystal 
oscillator trimmed. 

VIII. How It Works 

Simple in Theory... 

Most engineers don’t need the heavy technical 
details of how equipment works, but there are 
usually a couple of naggingquestions, like... 

What is a reclocking Circuit? Whydo we need a high- 
frequency clock? 

Reclocking Circuit. The data inside typical 
audio processors travels from chip to chip serially, 
that is, bit by bit. A clock pulse moves this data 
along. This clock bus is distributed to all the critical 
chips inside the box. As we’ve seen, it doesn’t 
matter if this clock is jittery, proper data still makes 
it to the next chip in line. But someiimes data needs 


Jitter 

























10 10 10 10 


t t t t t t t T 

Clean Clock 


A Simple Recíocking Circuit 


D Flip 
Flop 


10 10 10 10 

•ruum 


to be reclocked, for 
instance when feeding a 
D/A converter. Pictured 
here is a simple 
xeclocking Circuit; on 
the left side is an incoming data word that’s been 
clocked by a jittery clock; the data valué is 
(conveniently) 10101010. This word passes, one bit 
at a time, into a logic Circuit called a D-type Flip flop, 
which is being fed a clean clock. Almost magically, 
the data neatly marches out of the flip flop, and in 
theoiy, all the jitter is gone and the data is ready to 
leed the DAC. Notice how the clean clock’s pulses 
permit the flip flop to properly "sample" each data 
valué, but only if the clock pulse lands within the 
acceptance time of each incoming bit. In this 


illustration, the fourth (and eighth) data bit is in 
danger of being missed if it arrives a moment later, 
in which case the clean clock would land on the 
previous bit and the wrong data would be output. 
Fortunately, typical audio sources have much less 

jitter than in this 
illustrative example. 
Otherwise the System 
would break down 
and we would get 
glitches, clicks or 
hash instead of clean 
audio, and then the 
output data is really 
being changed!' 5 



rtit 


Wordclock 

in pll 

High Frequency 

¡n, 44.1 kHz 


Bitclock ("Superclock") 


44.100 Hz Wordclock 


1,058.400 Hz bitclock for 24 bits 


A PLL isneeded to generóte the 
higher frequency clock required to 
move the individual bits from 
place to place. 


Why PLL? The 

figure at bottom left illustrates why a phase-locked 
loop (PLL) isneeded. Ifwe arepassing24-bit audio 


bit by bit, then we need a hígh-frequency clock 
pulse that is 24 times the frequency of wordclock. 
Wordclock enters the device, and has to be 
multiplied up to the higher frequency to drive those 
bits around, known as the bilclock. It’s easy to divide 
down without creating jitter, but veiy difficult to 
multiply up, and it’s the job of the sophisticated 
circuitry of the PLL to create the higher frequency 
while reducing incoming jitter. 16 A PLL is a son of 
electrical flywheel; it tries to find a center, holding 
reasonably steady while still following the average 
frequency of the incoming source. 

...Complicated In Practice 

What makes these circuits so difficult to design 
well is that at high frequencies, leakage from the 
jittery portion of the circuit can travel through back 
paths to contamínate the clean portion of the 
circuit. These paths inelude power supply and 
ground. Couple that with outside interference and 
ground loops, and you have an analog designer’s 
nightmare. 10 picoseconds error can make the 
difference betweenan 18 or 20-bit noise floor. 
Some manufacturera use a dual-PLL, where the first 
is an analog circuit, and the second a voltage- 
controlled crystal oscillator (VGXO), in an attempt 
to get the jitter down to that of a quartz ciystal. 
Unfortunately, designs usingVCXOs caiinol 
varispeed because of their narrow frequency 
tolerance. It is difficult, yet possible to design a 
jitter-immune PLL that’s as good as a crystal, has 
wide frequency tolerance and quicklockup. No 
matter how many PLLs it says on the label, the 
quality of a designer’s work should be tested 
objectively.’ 


240 


Chapter 19 


• 1 once owned a Swiss watch that said "17 jevels" on the label but only 5 of them 
rattled when I shookthe watch case. What does the label really mean? 



















IX. Jitter Measurements 

Here are some jitter measurements made on 
D/A converters. Before you buy an expensive 
consolé or conversión system, you can take 
measurements like these yourself, using readily- 
available test equipment. You’ll be shocked at the 
variance in performance from one model to 
another. It’s unfortunate that magazine reviewers 
and editors like to see siiigle-number performance 
(e.g., this converterhas an intrinsicjitter of 4,0ps ), 
which means little technically and nothing psyehoa- 
coustically. What we need to see are detailed graphs 
of the deterioration of a converter's performance 
when it is fed a jittery signal, and this is the least 
that we should expect from a magazine reviewer. 

A/D and D/A converters can be tested for the 
effects of jitter using a very high frequency sine- 
wave test signal, but the test signal must be veiy 
puré and frequency-stable, probably crystal or 
digitally-generated. For these DAC tests, I used the 
J-Test signal invented by Julián Dunn, an 
independent consultant best-known for his work on 
the Prism brand of converters.' 7 The 34-bit J-test 
signal was not available, so the 16-bit versión was 
used; we’ll llave to ignore some arlifacls that are part 
of the source signa!. Here are a few guides: The 
lower the noise floor, the less jitter. We have not 
fully learned which jitter spikes are psychoa- 
coustically important, but, as I have said before, my 
listening tests show that jitter must be veiy low 
(cióse to the system noise) to be inaudible. Also, 
since test equipment varíes, your J -Test results will 
be different from mine, but relative rankings will 
likely remain. 


In the the color plates section, color figure 
C19-01 shows, in red, the noise floor of my 
UltraAnalog A/D (which 1 used to sample the 
outputs of various D/As under test), and in blue, the 
artifacts of the 16-bit J-Test signal, which are at 
— 13 :? to -135 dBFS .' 8 This means if we appear to 
measure jitter in the device under test below— 13 ?, it 
may simply be due to artifacts of the test signal. I 
think it’s more important to lookat how the jitter 
artifacts affect the DAC’s own noise floor and at 
what particular frequencies, than to calcúlate the 
actual jitter valué in picoseconds. 

Color figure €19-03 in the Color Píate section 
shows a considerable measured difference in jitter 
performance when an inexpensive consumer D/A is 
fed from two different sources. A cheap consumer 
CD player yields the highest output jitter, with the 
output of Sonic Solutions even less. If this were a 
linear display instead of semi-log, it would be more 
obvious that jitter usually produces paired artifacts 
around the center frequency, usually at equal 
deviation aboutthe center. Compare the consumer 
D/A’s performance to that of the excellent "jitter- 
ímmune” TC Electronic System 6000 D/A. When 
fed from either of two sources, the TC's jitter is 
effectively identical and just about as low as its 
quiescent noise floor! 

Color figure C19—o 3 , in the Color Plates 
section, shows that sync mode hardly affects the 
TC’s jitter performance, with extraordinaiy 
measurements in internal sync and slight 
differences when locked via AES/EBU. When slaved 
via AES/EBU it produces very slightly more jitter 


241 Jitter 


(only the two discrete frequency blue lines circa 
—117 closest to the center frequency). When on 
internal sync (red trace), its jitter is nearly as low as 
the UltraAnalog’s noise floor, and realize that raost 
of thegrass is the 16-bit J-test signal itself. I can 
hear a slight degradation in sonic clarity, a smeared 
image and brightness when the TC is slaved to 
AES/EBU, which implies that the black-colored 
spikes at approximately -nydBFS may be audibly 
significant. 

The Weiss is the first DACI have measured with 
no apparent trace of discrete frequency jitter in its 
output when lockedvia AES/EBU (Figure C19-04 in 
the Color Plates section). Instead, its noise-floor 
rises with the test signal and the jitter "skirts" 
appear to widen; all incoming jitter has been 
converted to random noise. Or is the sonic 
ímprovement due to euphonic coloration (higher 
noise floor masking discrete jitter components, 
since we can no longer see the floor of the test signal 
itself)? This also brings up concerns about potential 
converter noise modulation with signal, which may 
mask low level signáis or reverberation. However, 
low-ainplitude, random noise is the most benign 
signature one could wish for and the DAC sounds 
great. 1 did not test the DAC on internal sync. 

In Conclusión: When it comes to jitter, there’s 
a lot more to know than what meets the ear! Until 
our audio systems have advanced to the point where 
all data-identieal sources sound identical, tlien we 
cannot make valid judgments about sound 
character. 

1 For thoac of you who wcrc born aftcr the era of analog tape, print through ia a 
phenomenon where one layer of magnetic tape magnetically imparta some of 


Chapteriq 24.2 


its signal on the adjoining layer. Afteryeare of storage, it is possible to hear two 
or three repeating echos in the tail decay of a song recorded on analog tape 
(which can be rcpaired by the "adding tails" technique explained in Chapter 7) 
But even when print-through does not provide a distinct echo, it is always 
thcre to some extent. affecting the clarity of the sound. A "low-print" tape has 
less print -through than a high print. 

2 Technically speaking. any processor which adds random dithcr is not a State 
machine. All good SRCs and cqualizere cmploy internal dither to linearize the 
process. The randomizrag effect of dither means that each output pass will 
produce slightly different data at each instant. However. on the average. the 
output stream is really the same, and if we could subtract the random dither 
from the output signal. each pass would be idéntica). You can prove this by 
runningtwo passes through the same equalizer. liningthem up. and 
subtracting one from the other. You would be left with no residual of the signal. 
only random noise, thus proving the processor is a State machine. By the way. if 
you tricd this with an ASRC. you would hear a small amount of residual signal 
inthe noise floor, proving that an ASRC is not a State machine. Many digital 
processors create ditherusing a pseudo-random sequence. which predictably 
repeats aftcr a period of time, so they are perfect State machines; if when 
comparingtwo successive passes you can find the moments where the two 
dither signáis exactly line up! 

3 When transferring from file to file, there is no dock at all and the process just 
deais with one sample after another. I sometimes expía in jitterwith a bowling 
hall analogy. Throw a series of bowling balls. some white and some black. down 
the alley. Although theirtimmg is irregular, when they land back on the stand, 
the white and black are in the same dala order. Digital processors look at the 
samples (bowling balls), not at the time they arrive. so the output data is 
identical, even if the timing is irregular. 

^ As illustrated in a demonstraron of extreme jitter that 1 performed at the AES 
Convention. when the audio was so distorted by a jittety dock that it was 
unrecognizable. but the data remained intact. See the accompanying sidebar. 

I’d like to thank Andy Moorer for coining the message/messengerdichotomy in 
understanding jitter. 

5 The external wordclock replaces the signal from the quartz crystal with a PLL. 

In the vast majority of converters manufactured today. "the jitter caused by 
wordclock is typically 15 times higher than when using a quartz based dock." 
accordingtothe manual for the RMEmodel ADI-8-DD forinat converter. For 
mid-pnced converters vrhere little attention was paid to the internal dock 
design, l’vc seen some surprising situations w'here mtemal dock is not as good 
as external. but it is muca cheaper and easier to design a good quartz dock than 
a powcrful PLL. External dock can never perform better than a reasonably- 
designed internal dock, because of all the jitter -prcducing obstacles involved 
in dock extraction and regeneration (the job of a PLL). In fact. I go so far as to 
say that any digital conscle or DAW interface that does not perform equally or 
better on internal dock is a defective design that is definitely not livingup to 
itsfullsomcpotential. Insist on full-disclosure and FFT-based jitter 
measurements before buying. 

6 According to a simplified formula from Don Moses. Enclosure Detuning for 
20-Bit Performance. Journal of the AES preprint 3440 October 1993. 

The followmg «xpression utiliitt Carlsoni similar triaigle analysis mtthod 

and is useful for the case where: (1) the jitter deviation is small compared 
to the sampling interval, (2) distortion is measured at the zero-crossing of 
a sine wave, (3) the peal*-to-peal< amplitude is normalized to 1-V, and (4) 
the máximum slope is approximated as 2 X the information bandwidth: 

Resolution (in dB) = 20 log (time deviation x 2 x information bandwidth) 

Forexample, 25 ps of jitter, 20 kHz information bandwidth, yields: 


20 log (25 ps x 2 x 20 kHz) = -120 dB, which provides 20-bit resolution. 

In other words, if you double the sample rate to 88.2 kHz (the information 
bandwidth becomes -40 kHz), the same amount of jitter reduces signal to 
noise ratio by 6 dB. For 20-bit performance, at 88.2 kHz. if you consider the 
information bandwidth goes to 40 kHz. you wculd need to halve the jitter to 
less than 12 picoseconds. .And for each 6 dB improvement or 1 - bit increase in 
wordlength, you mjst halve the jitteryet again. Even if you limit the 
information bandwidth to 20 kHz. in order to get excellent performance with 
long wordlength. itbogglcs the mind the degree of care required to lessen 
external EMI/RFI. bypass power supply problems, to say nothingabout the 
stability of the PLL required! Maldng it clear the myth of the >20 bit converter, 
which may have >20 bit quiescent noise. but how does it perform with real 
world signáis? 

7 Not exactly musí. In a self-contained audio forvideo post studio. it is possible 
to make the A/D the master dock for eveiything, by using an AES/EBU to video 
sync converter, thus forcing the video dock to slave to the audio instead ol the 
other wayaround. Presto: Low jitter whilc mastering for video, anda product- 
design opportunity for fussy audio mastering engineers who must work with 
video. 

8 I'm 6urcyou're curious as to the brand. It’s the TC System 6000. Other current 
converter manufacturera who claim to have produced "jitter-immune" units 
inelude Prism. dB Technologies. Benchmark. and Weiss. The lockup time in 
the Weiss. Pnsm and TC units is virtually instantaneous. pointing out that 
jitter-free design does not require a longlocktip. In fact. the mathematics 
proves that any buffer longer than about a sample is superfluous, since jitter is 
a small fraction of a sample period. 

9 Bob Harley of Stereophile measured output jitter as low as 10 to 100 picoseconds 
with some of the best audiophile CD transporta, but as high as 1500 picoseconds 
(1.5 ns) with a DATmachine. Different methodsof measuring jitter yield 
different results. bit the approximate relative valúes will remain. 

10 In the case of a cheap crystal oscillator. the external dock is cleaner than the 
internal. The cleaner the incoming dock, the less work the PLL has to do. But 
the output jitter of a PLL can never be better than its nascent or intrinsic j.tter. 
and is typically woree. The output jitter of a PLL is a combination of three 
things: its intrinsic jitter. incomingjitter. and the PLL's jitter attenuation. 

11 However, a crystal which is off the standard center frequeney can cause locking 
problems. since many low jitter PLL designs have a narrow lock range. But if 
the system components lock. then an off-standard crystal won't affrct digital 
dubs at all. Some PlLs have a narrow and wide settingto deal with sources that 
are a bit off the standard. Switching to wide mercases frequeney tolerance. but 
also incrcases the PLL’s jitter. Don’t be conccmed, as long as the PLLis not 
driving a converter. 

12 Reports from the musical artists themselves led engineers at Sony Corporation 
to work on improving the jitter in their CD cuting systems. Without outside 
influence. some major artists had been reportingthat their CD pressings did 
not sound as good as the reference CDRs they had received. We theorize that 
irregular pit spacing or inadequate pit depth on the CDs themselves is affecting 
the player’s servo mechanism. The servo mechanism and sample clock sharc a 
common power supply. so with poor power supply bypass in the player, simple 
power or ground leakage may affect the stability of the clock. It doesn’t takc 
much leakage to change a few picoseconds. Critical listeners making CDRs 
have heard superior sound with SCSI - based CD Recordera than with 
standalone CD reccrdera. In standalone recordera, the master clock driving 
the láser is slaved to the AES/EBU input. while Computer-based recordersuse a 
FIFO buffer and a ciystal clock to drive the láser. 

We theorize that the reason we have not noted such differences with DAT 


2 43 Jitter 



machines is ihat even íf the incoming diftcrenccs are passed on to the 
médium, thcy are swamped by the much-higher jitter in a typical DAT machine 
than in a typical CD player. What's a few picoseconds out of thousands? 

1 3 Julián Dunn rlarifies: "This could be considered to be a synchronisation issue 
and these are covered in AESi i. These define the permitted output alignment 
error (+/-5°/t of a frame penod) and the tolerancc to input timing offset (♦/- 
25% of a frane period) before the delay becomes uncertain. 

The spccifications for the interface itself (AES 3 , IEC60958) do not allow a 
receiver's ability to decode data to depend on the relative alignment of docks - 
as long as the dynamic variation is within the jitter tolerance spec. (about +/- 
4% of a frame period at low jitter frequencies)." 

14 If you're spending $ 3 o.ooo and upward on a digital consolé, request the 
manufacturer to sign an agreement that the digital inputs and wordclock 
framingtolerances must meet or cxceed the AES11 synchronization specs or 
the manufacturer will correct the problem at no charge. This amounts to a sad 
wakeup cali tD the manufacturcrs. but cor.sumers should be entitled to 
interface real-world equipment to their consoles. 

15 Those dyslexics in the audience will appreciate that I am takingslight liberty 
with this discussion for ease of understanding. Sincc the left hand end of the 
bitstream is the last to go into the flip flop, the "fourth bit" counted from left to 
right is actually the fiflh bit to go in! Thisleads to the requirement that 
software has to decide whether to make the left or right end of the bitstream be 
the most significant bit. Intel and Motorola have been fightingover this for 
decades, so i: you don’t follow this part. yau're not the only one! 

16 Many bitdocks are 3 ax the wordclock. or greater, to allow for a longer internal 
wordlength. A typical PLL may genérate asupcrclock which is 128.256 or even 
384 times the wordclock frequeney. and is then divided down using a simple 
divider. 

17 The J Test isaspecial signal designed to aggravate a D/A converter’s jitter. It 
contains a fundamental signal at 1/4 the sample rate. which is 11.025 at 
44.1 kHz SR and a low ? frequeney componcnt added to dcliberately add data- 
jitter on the AES input. The test is particularly designed to pick out thrr 
interaction to the sample dock from the data on the AES/SPDIF ínterface that 
is used to derive the sample dock. When AES/EBU is not involved, it would be 
more pradical to use a simple clean high frequeney tone. 

Julián: ‘There are four 24 bit numbers ina sequence that is 192 samples long 
that repeats. 

oxCooood. oxCooooo. 0x400000,01400000 (x 24 i.e. 96 samples) 
oxBFFFF?. oxBFFFFF. ox 3 FFFFF, oxSFFFFF (x 24) 

Binary 

1100 0000 0000 0000 0000 0000 
0100 0000 0000 0000 0000 0000 
ion 11111111111111111111 
001111111111111111111111" 

The 16 - bit versión of the J Test signal is rnrrently availahlp nn a CD from 
Audio Precisión and on another test CD from Checkpoint Audio in the 
Netherlands. 

Further information can be found at Julián’s website www.nanophon.com 

18 The measured amplitude of the noise depends on the number of points (bins) 
in the FFT. the window which is used, and the A/D converter in the 
measurement equipment. which is why each reviewer's results will be 
different. These measurements wcre taken with a 32 K point FFT with an 
averaging time of about 2 4 scconds, anda Hanning window. 


Hexadcci: 

Cooooo 

400000 

BFFFFF 

3 FFFFF 


19 There is only one right "kind" of dígita! cable, one whose impcdance is a 
correct match for the Circuit (e.g. 75 orno ohms). Some audiophile manufac¬ 
turera have made so called digital cables which are improper for the Circuit, 
but since they affect the sound of a typical, consumer grade D/A converter in 
some unpredictable way (usually adding jitter. not reducing it). consumere 
have been known to play with such cables to tune their Systems. It’s a losing 
battle. because the cable-induced jitter reduces resolution and colore the 
sound. 


CHaPTer 20 

Tips And 
Tricks 


I. Introduction 


This little chapter reveáis sorne previously- 
untold secrets of how to maintain and run a digital 
audio studio, including dealing with thc vagaries of 
timecode that just won’t stay stable, how to make 
clean AES/EBU connections, advice on hard disk 
formatting, and more. 


II. Timecode and Wordclock 
¡n a Digital System 


Drifting drifting drifting 

An engineer attempted to synchronize an 
analog tape deck, sequencer and digital tape deck by 
slaving everything to the timecode coming from the 
analog deck. The 
sync seemed to 

work fine for a I " There must be onlr one 

little while, but \ . , i 

after a coupie of 1 master iTi any system 

minutes, he 
noticed that the 

analog deck was drifting out of sync with the rest of 
the system. The reason for the drift was that there 
must (and can) be only one master in any system, 
and in this case there was already a master dock in 
the digital system—the digital audio dock. When a 
Computer (or interface) receives timecode, it takes a 
stamp or trigger from the first valid timecode it sees. 

From that point on, the interface ignores incoming 
timecode; it runs its own timecode, locked to the 
digital audio dock. The two sources would drift 
apart if thc sourcc of thc incoming timecode is not 
locked to wordclock. In this instance, timecode 


from the analogtape deck is independent, based on 
the speed of the tape deck. 


One method to synehronize an analog deck with 
a digital system is to slave the analog deck; in order 
to avoid introducing wow and ñutter. a special type 
of flywheel synchronizer speeds or slows the analog 
deck, holding it within an acceptable margin. The 
other method is with a digital dock 
generator/timecode regenerator specially designed 
to lock to analog-style timecode, that locks to the 
analog deck and slowly adapts the master dock to 
the rate from the analog deck. The latter method is 
likely to cause higher jitter; it is much better (as in 
the first method) to have the A/D on internal sync. 


When locking two digitally-based systems 
togethervia timecode. drifting will result if one or the 
other is set to the wrong timecode or wordclock rate. 
As we mentioned, usually the sequencer triggers to 
the first burst of timecode and then runs on 
wordclock. To prevent drifting, make sure the 
sequencer is set to receive the same wordclock and 
timecode as the DAW is transmitting. It would be 
nice if all sequencers did this automatically, bu: this 
only happens in a perfect world. 

Pull-ups 

Things are far simpler without video. At 44.1 
kHz SR and 3 o (25) timecode fps, there are exactly 
1470 (1764) samples per frame, so just divide the 
audio rate by an exact integerto arrive at the 
timecode rate. But when NTSC video is involved, the 
timecode rate is slower, 39.97 fp s - whichyieldsa 
non-integer number of samples per frame. 
Normally the wordclock is slaved to the video, so we 


Chapter 30 246 


require a sophisticated wordclock generator which 
takes in video and produces wordclock with the 
right ratio, called a pull-up, by approximately 0.1%. 
If not, then the two systems will drift apart and 
audio-visual sync (e.g., lip sync) will be lost. 

Wordclock Voltages 

The problem with standards is there are so 
many of them! With Johnny-come-lately digital, no 
voltage standard was developed for wordclock, and 
this lack of standard has produced a chaotic 
situation. Many of the earliest wordclock generators 
were based on video sync (blackburst) generators. 
which produce 4 volts peak to peak into a 75 ohm 
load (abbreviated pvp-p). This is a fairly expensive 
Circuit, so soon wordclock generators appeared 
based on the video standard of only 1 v. Other 
manufacturers settled on a TTL -level standard, 
which ifterminated is 2.5 volts, and unterminated 
could be 4-5 volts! Chances are if a device will not 
lockto incoming wordclock, eitherthe receiver is 
insensitive or the generator is not putting out 
enough voltage. At this point, the only way out of 
this mess is to insist on wordclock generators that 
produce 4 volts and wordclock receivers that can 
accept anvthing between 1 and 5 volts. Impedance- 
matchingis not that critical on low-frequency lines 
such as wordclock, so if the cable run is short.you 
may be able to make the circuit work by removing 
the load termination, or in extreme cases, lowering 
the valué of the source resistors in the generator 
below’75 ohms. If this doesn’t work, thenyou need a 
new wordclock generator. An oscilloscope can verify 
the amplitude of the wordclock. Caveat emptor. 


House Video and DARs Sync 

Surprisingly, in the year 3002, video and digital 
audio interfacing is still in a primitive State . 1 This is 
because neither wordclock ñor video contain 
markers as to the beginning of the digital audio 
frame or channels. If you use house video directly 
you will produce a word clock of the correct 
frequency but not the correct phase." It is highly 
likely that two video recorders containing digital 
audio will not be phase locked. This will cause 
unpredictable phase shift because there is no phase 
reference in video sync, so think twice if you are 
forced to lock múltiple audio devices via video sync. 
Wordclock has fewer such problems but it still does 
not define channel beginnings, and this lack can 
result in channel reversáis, for example. The only 
dependable sync reference for múltiple devices is 
AES-11, also known as DARS (digital audio 
reference signal) which is equivalent to AES/EBU 
with muted (black) audio; it maintains channel 
beginning markers (blocks) and channel -to- 
channel relationships. If a video house reference 
must beused, I suggest hookingit to a single distri- 
bution amplifier that then derives AES-11 sync. At 
least from that point on all subsequent devices will 
be properly referenced to each other. And in 
multichannel work, do not split channel processing 
to múltiple devices which only accept wordclock 
sync, which can cause latency differences (many 
samples of error) or indeterminate sync, a variance 
of 1 or more samples between channels. There 
should be no interchannel sync problem if devices 


carry all channels and are used in a chain, each one 
locked to the previous. But see Chapter 19 for jitter 
and framing considerations when daisy-chaining. I 
caution against indiscriminate use of new interfaces 
such as USB and Firewire for multichannel audio 
until we understand the latency issues therein. 
When in doubt, insert coherent test signáis and test 
for phase shift between channels. 


III. Debugging AES and S/PDIF 
Digital Interfaces 


’Thefirst step infixing interface 
problems is to sepárate the 
issues into two parís: hardware 
and software. ” 


When the AES/EBU and S/PDIF interfaces 
were created, the 
use of standard 
audio connectors 
and cabling 
seemed like a 
godsend, but 
people were 
tempted to use regular audio cables, which were 
ncvcr intended to carry the high frcqucncies of 
digital audio (about 6 MHz bit rate for 48 kHz SR). 

So eventually we ended up with special RF-rated 
cables attached to our old-fashioned XLR 
connectors. This identity problem will likely go away 
as we move from ‘¿-channel to multichannel, which 
generally will require specialized connectors. 7 The 
easiest way to debug interface problems is to divide 
the issues into two parts: hardware and software. 

The hardware ineludes the cables, connectors, and 
signal levels, and the software is the bitstream and 
how it is interpreted. 


* Thanks to Julián Dunn (in correspondente) for this good advice. 

+ The multichannel MAD 1 standard interface uses BNC connectors, which are 
already fully- compatible with a video installation. 


For example, getting consumer DAT machines 
to record from a digital source used to be a headache 


247 Tips and Tricks 




(DAT machines are already officially obsolete as of 
this book publication*). Here’s the typical behavior 
of an apparently malfunctioning machine: Set the 
machine to digital input; without a digital source, 
the record-ready button will not light. Connect a 
digital source to the DAT, and the record-ready will 
opérate as well as the sample rate indicator; 
however, the machine will not go into record. Some 
machines show incoming level on their meters, 
others’ meters appear to be dead. If we divide the 
problem into two parts, it will be clear that the 
problem is not in hardware but in software, for the 
rnachine’s record-ready light tells us the machine is 
locked, it’salso indicatingthe incoming sample rate 
and sometimes even the meter is 
functioning. The lock indicator is the line 
of demarcation between hardware and 
software problems, as in this figure at 
left. 


consists of digital audio data 
mixed with status flags. The 
Receiver locks to the incoming 
signaI and separótes it into its 
component parts for use by the 
D4W, DAT machine, etc. 


II there is nu lock, ur the lock 
indicator is intermittent, or the output audio cuts in 
and out, then look to the left side of this diagram for 
hardware problems (voltage level or cabling 
problems). Otherwise, the problem is in software, 
either with the flags being transmitted or the way in 
which the device interprets the flags (more on flags 
in a moment). 

Fixing Interface Hardware Problems 

To the outside world, a digital interface either 
appears to work or not: we never have much idea 
how well it’s working unless either it stops or we 
stop to measure it. I’d love to see signal-quality 
indicators in a receiver; even a green-yellow-red 

• Replaced byharddisc.CDR and DVD-R 


Chapter 20 248 


lock indicator would be nice. Currently, the only way 
to assess a hardware interface is to measure its 
objective performance by looking at the width of an 
eye pattern on an oscilloscope. Always use matched- 
impedance cabling, especially for longruns (either 
75 ohm low-loss coax for the unbalanced interface, 
or 110 ohm cable for the balanced). The halanced 
signal should measure between 2, and 7 volts p-p, 
while the unbalanced signal should not be below 0.5 
v p-p, into a terminated load (all S/PDIF and 
AES/EBU inputs are terminating). 

With the balanced interface, shields are actually 
unnecessary, as can be illustrated by the success of 
Belden’s Mediatwist™, consisting of four bonded- 
twisted pairs for up to 8 channels, and performing 
as well as the highest-grade coax. In fact, standard 
Caí 5 twisted pair Ethernet cable makes a very good 
AES/EBU cable, second only in quality to 
Mediatwist. The biggest problem with the hardware 
of the unbalanced consumer interface is the low 
voltage (0.5 v p-p), which can easily degrade with 
lossy coaxial cables or long cable lengths. These 
problems could have all been avoided if only the 
S/PDIF interface protocol had -specified 1 volt like 
the AES3-ID standard, which uses a BNC 
connector, popular with video houses. 

Improving the stability of the unbalanced 
interface. The stability of the unbalanced interface 
can be considerably improved by upgradingto 
special low-loss 75 ohm cable, and/or by raising the 
output voltage from 0.5 volts to 3.5 volts, easily done 
by replacing the voltage divider at the transmitter 
with a single 75 ohm resistor, as in this next figure: 









The low voltage of the 
coaxial digital interface 
can be overeóme by 
replacing the transmitter's 
2-resistor pad with a single 
75ohm. 

This modifi- 
cation to the 
transmission 
side works so 
well because it raises the noise margin of the 
receiving circuit at no significant cosí or 
interference with other circuits. Waming: modifying 
circuits usually voids the warranty. Although the AES 
standard is between 2 and 7 volts, note that 
commonly the same audio receiver chip is used for 
bothAES/EBU and S/PDIF decodingand it can 
accept from as low as 200 mv p-p to as high as 7 
volts, so highervoltages are usually not a problem 
with S/PDIF but extreme low source voltages reduce 
noise margin and may introduce dropouts or 
glitehes. Input transformers are almost always used 
for both the balanced and unbalanced interface, so 
the major difference between AES and S/PDIF at the 
input is a change of connector and termination 
resistor between 75 and 110 ohms. 

Impedance Mismatehes. A mismatched 
impedance (as well as circuit imbalance) will result 
from putting an RCA connector on one end of an 
XLR cable without changing the source or load 
resistors. However, at short cable lengths and lower 
sample rates, impedance mismatches and voltage 
variations are far less of a problem than is 
commonly thought. As long as the signal gets 
through with adequate voltage and few reflections 


(probably not a problem with a short cable), the 
receiver chip will decode it regardless of the 
impedance, albeit at the cost of jitter and possibly 
reduced noise margin. And noise-margin in the 
digital circuit does not affect sound quality unless 
the digital signal is so low the receiver drops out and 
loses sync. There are several proper-impedance 
methods of connecting a digital balanced source to 
unbalanced load orvice-versa, descriptions of 
which are beyond the scope of this book. 3 

Cable Lengths. The higher the sample rate, the 
shorter the tolerable cable length, because of the 
possibility of interfering reflections from the 
impedances and connectors at each end of the cable. 
The AES 3 standard specifies usable lengths up to 
100 Meters at48 kHz, which is possible with careful 
termination and high-bandwidth, matched- 
impedance cable. However, i/4,wavelengthisthe 
critical length where reflections can become their 
worst, so impedance and termination errors will be 
aggravated with cables that are cióse to ahout 20 
meters, or 66 feet (48 kHz SR),' or 33 feet at (96 kHz 
SR). The critical length issue is one probable reason 
why standard-length mike cables make bad digital 
audio interfaces. NeithertheXLRnorthe RCA 
connector was designed with exacting impedance 
specifications, so avoid passive hardware patchbays, 
splices, and múltiple intermedíate connectors 
which will tend to exacérbate impedance problems. 

Fortunately, cables do not have to be cut to the 
same length, since the AES n standard permite a 
framingtolerance of 25%, and 25% of a 192kHz 

• The 48 kHzAES/EBU interface has a prime bit frequeneyof 3.072 MHz, 
extendingto about S MHz. 



349 Tips and Tricks 








frame is i. 3 us, the production of which would 
require a cable length difference of over 200 
meters. In addition, should cable lengths or 
equipment delays exceed the 25% error, the only 
signal degradation would be to insert a delay of a 
sample to the signal (or more samples for much 
larger mismatches).* 

Optica! Cables and length. Obviously. 
electrical impedance is not a consideration when 
the interface is optical. The main concerns are bit 
integrity and jitter. And as we explained in Chapter 
19, as long as the bit integrity is maintained, then 
jitter is only a consideration when delivering signal 
to a D/A converter or sync signal to an A/D. When 
usingjitter-susceptible DACs tiy to avoidToslink 
optical connections because their low bandwidth (3 
MHz forthe Sharp brand, up to 6 MHz for the 
Toshiba) exacerbates interface jitter. But the bit 
integrity is perfectly acceptable on a plástic Toslink 
interface as long as the lengths re main under 5 
meters (some receiver models support up to 10 
meters) beyond which there is unacceptable signal 
loss. 

If you have to run long optical cable, a perfectly 
legitímate test for cleanliness of an optical interface 
is margin distance before dropout. While looking at the 
lock indicator on an AES receiver, or, simply 
listening to the audio, disconnect the cable from the 
input and slowly pulí it outwards. The amount of 
distance you can pulí the connector before losing 
lock is an indicator of the margin of sensitivity of 
the receiver and the strength of the signal at the 

* Thanks to Julián Dunn (in corrcspondcnce) forcíarifyingthe framtng 
information. 


receiving end of the cable. If you cannot pulí the 
cable out at least 1/8^ inch, preferablv 1/4” or 
more, then there is probably too much loss in your 
cable length, or the transmitter is weak, or the 
receiver insensitive. It is possible to get more output 
from a Toshiba transmitter by changing some 
resistors, which can then give up to 6c meter 
transmission, but then a short cable, which has less 
loss, will overload the receiver. Glass ñber has much 
less loss than plástic, and can transmit for 
thoueands of feet; it also has superior bandwidth 
and therefore causes fewer interface jitter 
problems, jitter as low as any good copper 
connection. Class fiber connections can have even 
lower interface jitter than unbalanced copper 
connections because they elimínate ground loops 
and EMI sensitivity. I’ve seen some manufacturers 
adapt Toslink connectors to glass fiber, but if you 
want dependable long-length optical transmission, 
the best solution is to change receivers and 
transmitters to glass-type, which are electrically 
compatible once you have eonverted from optical. 

Fixing Interface Software Problems 

If difficulties still remain after eliminating 
hardware problems, then software issues are 
obviously the cause, and, sadly, these are much less 
straightforward to pin down and eradicate. The DAT 
machine cited above probably failed to record 
because the data stream was copy-protected. or the 
machine was expecting a professional channel- 
status bit when the consumer bit was presented, or 
because a sample -rate flagwas misrepresented. The 
same software eonsiderations apply both to copper 
connections or optical, as it is possible to feed the 


Chapter 20 250 



consumer or professional bitstream down an optical 
cable, or multichannel protocols such as the 
multichannel MADI or Sony’s DSD (multichannel 
protocols are beyond the scope of this book). 

The flags are the road signa of the bitstream, 
officially known as channel-status bits. Over the 
years, the standard has been abused, evolved, 
multiply-interpreted, mutilated, or justplain 
forgotten about, like the "detour ahead” sign which 
some worker never put away after repairing the 
road. This may sound like heresy, but I think the 
current implementation of the standard is so poor 
that many times it would be best for all receivers and 
recorders to ignore the flags and ask for human 
help. For example, one common problem is a flag 
sayingthe sample rate is "unindicated,” which stops 
some DATs from recording. Ironically a receiver 
can’t read a flag unless it's already locked to the 
sample rate, so it must know what the rate is without 
the flag! Therefore, it’s illogical for a machine to 
reject a digital audio signal because the sample rate 
is not indicated. And with the advent of dual-AES 
connections for double sample rates, each channel 
is at half the final rate, so the flag may be wrong 
anyway. The human being should be the traffic cop 
makingthe final decisión, not the machine-, thus the 
smartest DAWs make only certain assumptions 
about the bitstream, otherwise letting the user make 
adjustments from menus and checkboxes. In the 
case of the recalcitrant DAT machine, it may be 
necessary to insert a channel-status-bit analyzer 
and/or modifier, changing flags until the machine 
begins to record. 4 


The Critical Flags 

■ The status of a bit (flag) called the PRO bit distin- 
guishes the consumer bitstream from the pro. 
Howcver, the pro bitstream can run on consumer 
connectors and vice-versa, and it’s done all the 
time. This ineludes the Toslink optical interface, 
XLR, RCA and BNC, any of which can be used to 
carry consumer or pro information (by de facto 
but not official standards). So, never assume that 
the bitstream matches the connector unless you 
have investigated the equipment manuals or 
menus, or measured the contents of the bitstream 
with a tester. Fortunately, the audio itself is in a 
common place in both PRO and CONSUMER bit 
streams, and with some care, the two bit streams 
can be somewhat interchanged. 

• Although the interface can send full-bandwidth 
3 -channel PCM data, it also has been used to 
transmit coded (data-compressed) multi-channel 
data such as Dolby Digital and DTS. The Normal- 
versus-Data bit is used to define coded 
multichannel data, which cannot be read by an 
ordinaiy DAWor D/A As a precaution, these 
decoders will produce no audio unless the PRO bit 
is set (even on an RCA connector). The main 
danger is that an ordinaiy D/A or digital recorder 
may ignore the data bit and send full-level noise 
over the loudspeakers. For this reason I usually 
turn the monitor gain down whenever beginning 
to monitor an unknown bitstream! 

• The three emphasis bits in the professional 
stream partially overlap the copy-prohibit bit and 
the single emphasis bit in the consumer stream. 
Fortunately, professional DAWs ignore the copy 


25 ' Tips and Tricks 


prohibit bit and most recordings now are made 
without emphasis. 5 In general, the copy- . 
prohibit bit and SCMS bits are only used when 
reeording to eonsumer-grade CD and DAT 
Recorders. Future digital interfaces will 
encompass far more rigid copy-protection 
schemes, which will probably introduce further 
complications for audio professionals. 

The consumer bitstream can transmit program 
IDs (for automatic tracking in DATs and CD 
Recorders) but the pro bitstream has no such 
provisión, except that some Sony machines will 
interpret program IDs on the pro interface. 

The consumer bitstream was originally designed 
to carry ao audio bits, with the remaining4 
auxiliary bits available to carry a low-resolution 
auxiliary channel (e.g., talkback). But the 
consumer bitstream can utilize all ¿4 bitsto carry 
up to 24-bit audio, and this has become de facto 
regardless of how the flags are set. Although the 
standards committees spent much time carefully 
revisingthe standard so the consumer bitstream 
could flagthe use of those 4 bits, currently most 
transmitiere and receivere ignore those flags, and 
most current receivers default to assume 24-bit 
audio. It’s up to the user to take appropriate 
action: a bitscope (see chapters 2 and 16) may 
help sort out the issues. One D/A converter 
(Prism) and one consolé manufacturer (Yamaha) 
rigidlyfollowthe standard and automatically 
trúncate bits beyond 20 on the consumer 
connector; pro connectors must be used on those 
devices if you want to use all 24 audio bits. 

The esoteric consumer flags that govern category 
code practically affect two classes of reeording 


equipment: standalone 16-bit CDRand DAT 
recorders. If the source machine’s category code 
is set to CD, the recorders interpret user bits as 
track IDs, and if this category code is wrong, the 
recorders may write undesired track IDs. One 
DAW (SADiE) is capable of sending DAT and CD 
track IDs, and thus it must be set to consumer 
status and CD category code and there must be no 
bitstream modifiers in the line between SADiE 
and the recorder in order for automatic tracking 
to work. 

Two-wire 96K and 192K 

Originally, the highest sample rate that could be 
carried on an AES/EBU interface was 48 kHz 
(slightly higherwith varispeed). In order to double 
the sample rate with recorders that can cnly handle 
48 kHz, a system was invented that places half the 
samples on one cable and half on another, each 
cable running at half the final rate. One cable 
carries all the left channel samples and the other the 
right. If you plug one of these cables into a standard 
stereo DAC you will hear a mono signal that sounds a 
little strange since the timing between the two ears 
is incorrect. The main concern when using the two - 
wire method is to write impeccable documentation, 
since there is no standard flagto indicate the dual- 
wire method is being used, ñor which channel is 
which. At the time of this writing, there is no official 
single-wire interface for 176 and 192 kHz SR, so at 
least 2 cables are needed, and often 4 for stereo; 
good luckto anyone who scrambles those cables! 

Given all these violations and exceptions, it’s 
amazing the AES/EBU standard works at all! 


IV. How To Get Good Audio Extraction 

I rarely recommend brands outright, but in this 
case ni make an exception: Plextor. Plextor CD- 
ROM readers have been specifically designed for 
excellent audio. 6 Audio extraction from CD is not as 
easy as the Computer industry has implied; it is not 
the same as reading data from CD ROM. which can 
be done at high speed. Most drives fail at this chore, 
and default to speeds which cause dropouts or 
glitches in the audio. In contrast, the Plextor drives 
have available a special protocol which will read and 
reread any portion of a disc, slowing down when 
needed, until they get a good read. The audio 
program or operating system must be designed to 
work with the Plextor, which needs proper 
handshaking in order to speed up and slow down. As 
of this writing, only certain programs on the PC 
provide this functionality, ironically, not on the 
Macintosh. 

V. Compilation CDs/CD-On-Demand 

Producing compilation CDs is a problem for the 
quality-conscious engineer. In an ideal world, the 
same mastering engineer who produced the original 
dises should produce the compilation, which helps 
ensure a unity of sound. In an ideal world, 
compilation CDs should be made from original or 
early-generation sources, not by copying from final 
masters or pressed CDs. For if a track’s lcvcl (or EQ) 
needs to be adjusted, then the sound quality will 
deteriórate whengoingfrom 16-bitto t6-bit, 
especially when using a highly-processed 16-bit 
master as the new source. A final master represents 


the end of the Une of a processing chain, including 
limiting, which cannot be reversed, only the level 
can be lowered (yet sound deteriorates because of 
additional DSP calculations and the accumulation of 
16-bit dithers). 

But in the real world, record companies usually 
don’t want to pay to redo that which they’ve already 
amortized, plus, it's extremely difficult and 
expensive to acquire the source masters from many 
different places. Nevertheless, even in the real 
world, we can still make some decisions to maximize 
quality of the compilation CD. We try to produce bit - 
for-bit copies (clones) of as many cuts as possible, 
the ones which work together level- and sound- 
wise. For the same reason, if one of the cuts is out of 
Une and much louder, we try to convince the record 
company not to take the least common denominator 
approach-, that it is better to takc the lcvcl of the loud 
cut down. which avoids using degrading processing 
on the majority of cuts. 

The phrase good-soundingCD-on-demand is 
an oxymoron. As soon as users are given the ability 
to create their own CDs from previously-mastered 
product, change levels and then (hopefully) redither 
to 16-bit, the sound-quality will suffer. There is no 
shorteut ñor substitute for a good mastering 
engineer working from early-generation 
(unmastered) sources. 


2 $ Tips and Tricks 


VI. CDText 


Mastering engineer Jim Rusby, an expert on CD 
text, explains the process: 

CDText ¡s n facility thot provides 
titles, authors, and even lyrics on the 
display of specially-equipped CD 
players. Be aware that most of the 
replicators (pressing plants) are ready 
forCD text, bat many of the CD brokers 
are not, even if they say they are. 

There are two schemes for CD Text. One 
(and the most common) places the 
text in the lead-in area. The other 
extends it into the program zone; this 
scheme is also used for special 
applications like Karaoke. 

Sony has been freely distributing CD 
Text software at their Austria DADC 
web site. The mastering engineer 
organizes the text using the software 
and a set of binaries are generated 
that reside on a floppy, which ¡s then 
sent along with the disc master to the 
broker or replicator. Fields are 
available for such things as ISRC, 
álbum ñame, etc. This scheme is used 
by many (other than Sony) - the Doug 
Carson system supports it, for 
example. 

There are two general sources of 
problems with the process: 


1. Expectations. CDText is only 
guaranteed to work on CD Text- 
enabled CD Players. Performance of CD 
Text on computers (e.g. Windows Media 
Player) will vary depending upon the 
drive, software, and phase of the moon. 
Some clients confuse CD Text with 
CDDB databases—these are servers 
that your Computer logs into and gets 
info about the disc in your unit. 

2. Product ID. In recent years a number 
of CD burners began supporting CD 
Text. Consequently, the client types in 
the ¡nfnrmation, burns o disc and sends 
it in for replication. The pressed dises 
come back with no text on them. Be 
sure to tell your broker or replicator 
that this is a CD Text title. Don’t just 
send the disc in expecting all will be 
well. Most replicators have product 
codes. They will assign a piece number 
to your product that may indicóte 
what’s on the disc - and this code may 
tell the cutting system what information 
needs to be passed along. Many times 
there is bogus character information in 
the text fields of non-text titles. The 
■eplicator doesn’t want to pass along 
information that isn’t valid, so if they 
think it is just straight audio they will 
not actívate the text feature. Some 
replicators may require that text be 
submitted using the floppy method.’ 


Chapter *o 254 


• Jim Rusby. on the Mastering Webboard. 



Vil. Why do many mastering engineers 
use unbalanced connections between 
analog gear? 

My philosophy is¡ All othcr things bcing cqual, 
unbalanced is better, which boils down to a less is 
more philosophy. 

Here are the caveats: In a small room, where all 
the power is coming from a central source, and all 
the analog gear is plugged into that power and no 
analog audio enters or leaves the room, and you have 
yoursignal-to-noise and headroom issues all 
straightened out, then unbalanced is almost always 
better-sounding than balanced. Most balanced gear 
is created out of unbalanced internal connections by 
adding additional stages of amplification, which 
often creates a loss of transpareney; however, it’s 
important to study the schematics and determine if 
this is the case. In those cases, 1 may remove the 
extra stages, also beingaware of the internal gain 
structure, headroom, and driving capacity of the 
internal parts, which are going to be exposed to the 
outside world. 

Exceptions: a) Equipment whose balanced 
stages are so-well-designed that it is impossible to 
design the same piece of gear with fewer stages 
unbalanced than the balanced versión, b) 

Equipment which uses balanced topology 
throughout, with impeccably-designed internal 
components in a mirror-image configuraron. But 
Pm not so sure it sounds better because it’s 
balanced or just because it’s better! 


VIII. Analog tape simulation 
in the mixing process? 

I am concerned about recommendingthe use oí 
analog tape simulators in the mixing process unless 
you have world - class monitoring which can tell you 
unequivocally when (if) you’ve gone "too far.” 
There’s nothingworse than the sound of oversat- 
urated analog tape; turn the drive knob on the 
simulator one step too far and the sound will turn 
from "good” distortion to "bad.” Once any damage 
has been done, it cannot be undone without a remix 
and it’s a lot easier to do altérnate mixes at the time 
of the first one! Furthermore, I’ve found that after 
good mastering, a Iittle bit of analog tape simulation 
is enough; so using such a device in the mixing chain 
can be a problem, because it’s not possible to 
anticípate its interaction with the mastering 
processors. As usual I recommend that mix engineers 
send two versions of a mix to the mastering house, 
one with and one without processing. This applies to 
any processor(s) on the mix bus, unless the 
processor is so adjusted that removing it would 
seriously alter the intent of the mix. 

Speaking of Flux 

For those trying to get that sound with analog 
tape, personally, I have found that analog tapes 
sound too saturated, undefined, and muddy at +9, for 
9 out of 10 projeets in my experience. To be more 
explicit, reduce the level till o VU = + 6 dB over 200 
nW/M (known colloquially as +6), which is the same 
as o VU = +4 dB over 350 nW/M, is the practical 
limit for CP9, the hottest tape made by Ampcx. It is 
better, in my opinión, to run at +6 or lower and use 


*55 Tips and Tricks 


the extra as headroom, especially when usingVU 
meters. (SeeAppendix5) 

IX. ISRC Codes and UPC/EAN 

The UPC/EAN code is also called Mode 3 data 
and is a barcode that contains information about the 
product. Most times the Mode 3 data is added at the 
plant, but DAWs inelude a space for that data to be 
added by the mastering engineer. The Interna¬ 
tional Standard RecordingCode, defined by the 
RIAA is a unique code for each track on the CD. This 
allows use of automated logging Systems to be used 
at radio stations to track copyright 
ownership/royalties. The System is very popular in 
Europe and slowly gaining acceptance in the U.S. 
The record label provídes the codes to be entered 
for eaeh track. 

ISRC contains exactly 13 digits; only the digits 
without any dashes should be entered in the DAW. 

In the ISRC code: ES-BOi-01-10503, the first two 
digits are the country code (in this case, ES for 
España), the next three digits are the code for the 
original issuing record label, which owns the rights. 
The next two digits are the year the songwas 
recorded, and the last five are recording codes 
designated to the versión of the song itself. That is, 
Elton John’s versión of YourSong will have a 
different ISRC code from any cover of the same 
song. 

X. What’s special about the PMCD? 

The term PMCD was invented by Sonic 
Solutions as a method of allowingglass masters to 
cut directly from CDRs. However, as of this date I 


doubt there are any plants which continué to cut 
glass masters usingthe PMCD method, since Doug 
Carson systems introduced a different system which 
allows glass masters to be cut from any standard 
pressing or CDR. So there’s nothing special 
anymore about PMCD and most mastering 
engineers who may write "PMCD" on the label are 
probably creating standard orange-book CDRs. 

XI. The writable DVD confusión 

These are relatively new media and there is 
much confusión over their capabilities and 
distinctions, and the "standards” are in a State oí 
flux. I advise clients to send mix files on CD-ROM 
even though DVD could save a few dises, because CD 
ROMs are still the most compatible with typical 
readers. 

DVD-RAM is a rewritable médium, claiming 
íoo.oco re-write eyeles. but most existing DVD 
players cannot read DVD-RAM dises. DVD-RW can 
be read on more players. and the technology is 
limited to 1000 re-write eyeles. Look for a player 
labeled RW-compatible. The recorders for each 
format require specific blanks, which are not 
interchangeable. DVD-R can be played on most set 
top DVD players, yet have difficulty with older 
Computer drives. DVD-R can only be written once, 
not a problem since the costs of blanks have become 
affordable. 

XII. Mastering for Vinyl or Cassette 

The full considerations required for vinyl and 
cassette mastering require more space than is 
available in this book. These days, most mastering 


Chapter20 256 


engineers do not have a cutting lathe, and should let 
the experts do the final processing for vinyl, which 
usually ineludes narrowingthe separation at the 
bass end to protect the groove excursión, and some 
high-frequeney limiting to protect the cutterhead. 
The LP cutting engineer will also determine the 
level of the record; there is nothing a mastering 
engineer making a DAT or CDR can do about the 
absolute level of the final vinyl. When making 
masters for vinyl, the one thingto be concerned 
about is duration, especially when there is a lot of 
bass on the record. A ten-minute side is usually no 
problem when there is heavy bass. It’s technically 
possible to put a half an hour on an LP side, but 
almost inevitably with loss of level, slereo 
separation and/or bass. 

The cassette replication house may not have a 
skilled mastering engineer, so the original digital 
engineer should make a special premaster for 
cassette, followingthe processing and level 
guidelines in Chapter 15. Tiyto make Side A the 
longer side, otherwise in the car there will be an 
irritating pause in the music at the end of side A. 

XIII. Low Level vs. High Level Hard Disc 
Formatting 

Most operating systems and dise Utilities 
provide an option to format a hard disc, but the 
engineer should be aware that there are two 
different degrees of formatting: low level and high 
level. High level formatting is the most common 
type. High level formatting installs the operating 
file system and a new directoiy, e.g., Mac HFS, or 
FAT 3 ?, and is the most reliable way to initialize 


(remove and erase) the directories on a disc. It 
should take only a couple of minutes to high-level 
format a hard disc of any size. Note that high level 
formatting does not erase the whole disc; your oíd 
files are prohably still there and a clever thief can 
find traces of them even though the oíd directoiy is 
gone, as long as the oíd files have not been written 
over. 

Low-level formattingcompletely erases a hard 
disc, and thus may take from several minutes to 
several hours depending on the size of the disc. Low 
level formatting reinitializes the sectors and 
compensates for physical changes in the disc as it 
ages, and it also maps out bad sectors that have 
errors. It’s a good way to check out any suspect hard 
disc, and a good thing to do to rejuvenate a drive 
that’s a couple of years oíd and in apparently good 
shape. Read the error reports afterwards to see how 
many sectors or blocks were mapped out, for 
anythingmore tlian, say, 5 to 10, indicates the disc 
is on its way to IBM heaven. 

XIV. Digital Monitor Controls vs. Analog 

As we learned in Chapter 16, some analog 
systems perform better than digital, and vice versa. 
Digital monitor level Controls used to sound quite 
poor, but a few well-designed systems sound as 
transparent as their analog counterparts, provided 
that we use low amounts of attenuation. A well- 
designed high resolution digital level control 
correlated with RP 1400 gains (see Chapter 14) will 
not require much attenuation to produce a proper 
loudness. Prior to purchase, test the proposed 
system’s distortion using an FFT and also by careful 


257 Tips and Tricks 


listening comparisons to an analog-based system. al 
equal loudness. The same goes for D/A converters 
with built-in digital monitor level Controls; some 
sound extremely transparent, and others quite 
grainy due to quantization distortion. 


Chapter 30 25S 


1 The SMPTE is developing a universal standard, to replace video reference. This 
proposed standard defines a format for transmití i ng a universal date and time 
reference, ealled the Absolute Time Reference (ATR) for the purpo&es of 
distributing synchromzation Information and for the distribution oí time. This 
is under the jurísdiction of the SMPTE group as wcll as the AES working group 
SC-oa-05. 

2 S/PDIF stands for Sony Philips Digital Inletface, which grew up into the 
IEC60958 standard, which supercedes !EC958. Officially. tvpe 1 is consumer 
wlth the consumer bitstream (protocol) on unbalanccd RGA orToslink optical 
connectors. Type 2 is professional, with the professionai bitstream overXLR 
balanced connectors. There is also the AES- 3 ID standard, which transmita the 
professional bitstream over a 75 -ohm BNC connector at 1 volt p p. However. as 
this Chapter points out. the devil is in the details. 

3 An internet search for IEC 958 yiclded this resourceful URL: 
http://www.cpanorama.net/documents / audio/spdif.html. which ineludes 
some balar.cingand unbalancingcircurs. However. most of the time. 1 
recommend usingan official RS 422 receiver/transmitter chip as the 
common-mode rejection will be superior. 

4 Digital Domain manufactures a simple channel -status bit modifier/analyser 
known as the FCN-i. More sophisticated analysers can be obtained from Audio 
Precisión, Neutrik. Prism. Audio Digital Technology (ADT) and others. 

5 Emphasis. also known as preemphasis. is an equalization curve. If emphasis is 
off. then the recording and playback are both made fíat. If emphasis is on. then 
the recording has a specified high-frequeney boost and the playback a 
correspondinghigh-frequency cut. intended to improve signal to noise ratio. 
However. since the SNRof fíat 16-bit ismore than adequate. and since 
headroom is reduced when emphasis isused. the prartice has heen prerty 
much abandoned. Furthermore, the flag which pertains to this has been 
abused by the pro-consumer conflict and has fallen into disfavor. Ifyou 
suspcct a recording has been made with emphasis, it is advisable to re-equahze 
it (roll off the highs). 

6 We all owe a great debt to mastenng engineer Glenn Meadows for having 
worked with Plextor to produce drives which meet the needs of audio 
engineers. and informing the audio community of their performance. 



WC’LL FIX IT 

IIlTHe 

SHrinKwrap. 


— Frank Zappa 



P A R T IV: O U T O F THE JUNO 1 E 


EverY oav 

m EverY way 

rm ceTTinG 

BeTTer anD Berrer 


•n 


— Bob's Müm 



CHaPTer 21 


Education, 

Education, 

Education 


What Have We Learned? 

As we reach the end of this book, it has become 
clear that the craft of Mastering requires 
tremendous attention to detail, teehnical and 
musical knowledge, plus the ability to get along well 
with a wide range of people from artists to record 
company execudves. In other words, able to leap over 
tall buildings in a single bound. But since Superman is 
not available, humans have to substitute, and all we 
can do is try to reach an ideal. Nobody’s perfect—we 
make mistakes all the time, the trick is to get to 
correct them before the product goes out. That’s 
what a System of quality control is about, reducing 
the level of mistakes until they’re below the radar. All 
we can do is try to measure ourselves against the 
tough words my Mother taught me: "Eveiy day in 
eveiy way I’m getting better and better." 

Another area we've stressed is that good-quality 
mastering requires a dedicated room with refined 
acoustics and accurate reproduction. But with good 
audio equipment and a talented engineer, a typical 
project studio can produce a good-soundingmaster. 
although with noisy fans. low-resolution monitors. 
interfering consolé and rack surfaces, the work 
involves a ti me-consumí ng, trial-and-error process. 
Check the material in as many altérnate environ- 
ments as possible. Project studios wishingto do 
mastering ought to construct a dedicated room for 
that purpose and hire an engineer inclined to the 
sküls of mastering. However, if for economic reasons 
you must master your own mix,* and in a less-than- 
ideal environment, then use as much of this book’s 
advice as possible. Also, master with the aid and 

• Perhaps misguided, since cuttingcomer» at the last stage before producinga 
record may prove very costly in the long run. 


i63 



perspectíve of an experienced producer present, or 
an objective professional whose ears you trust. 
Another person’s opinión will ensure that you aren’t 
so cióse to the material thatyou're missing something 
essential, especially if you are the artist. For example, 
if you know the lyrics by heart, then you are probably 
the wrong person to judge the vocal level! This is why 
mastering engineers avoid mastering their own 
mixes; I try to go to another engineer to master work 
that I have mixed—to get their valuable perspective. 
Masteringyour own álbum is like manyingyour first 
cotisin. You never know how the children will tum out, 
or maybeyou do!’ 

Without collaboration the music ¡s 
not being given its full potential. There 
is a reason that you have the talent, 
the engineer and the producer because 
each one can worry about their own 
thing and they can coliaborate on the 
final outcome. Wher music is done ¡n a 
virtual vacuum ¡t does not sound as 
good . 7 

The Cure for the Ear 

Our critical listening audience is diminishing, 
because the average hearing acuity of the modern 
listener has been getting worse, decade by decade! 
Living and working in the city causes a threshold 
shift in our hearing sensitivity, and exposure to 
high-level, distorted music in clubs can cause 
permanent hearing damage. The ear contains tiny, 
delicate parts which can take only so much battering 

• One mastering engineer likens masteringyour own mixto givingyaurselfyour 
own haircut! 

t Tom Bethel, from the Mastering Enginser’s Webboard. 


before they give up. Club owners should be required 
to pass out ear plugs to customers walking in the 
door, because alcohol dulls all our senses and we 
don’t notice that our ears are being bombarded. It's 
the physical equivalent of sticking thousands of 
needles in our arms and legs all night, but ignoring 
the pain! Our job as audio professionals is to 
edúcate our audience to these veiy real dangers. 

When clients are going to be driving or flying a 
long distance to the mastering session, 1 advise 
them to wear ear plugs, or any ear protection— 
cotton or tissue is better than nothing. This greatly 
reduces the fatigue of traveling. I suggest they travel 
the night before and get a good night’s sleep locally 
before the session, which reduces their threshold 
shift, and improves their perception duringthe 
mastering. 

The Cure For Our Art: Think Long-Term 

We need to edúcate record compames that a 
singles-oriented approach is self-defeating. It looks 
good for the quarter’s bottom line, but leaves no 
equity for the future. Instead, they should cultívate 
artists who have staying power, and long-lasting valué. 

The Cure For Stress: Dynamic Range 

Not eveiy recording benefits from having 
dynamic range, but I feel that recent trends in pop 
music recording have taken the fatigue oislamming 
it against the ivalí all the time to an extreme. So I'd 
like to briefly discuss the phenomenon and ways in 
which we can edúcate people to see just what they 
have been missing. These days, audio and visual 
media are perceived as advertising, continually 
tryingto get our attention. This bombardment is 


Chapter ai 264, 



very stressful, and because of that, we tune it out, 
turning it into audio wallpaper, just noise to us. 

While not an advenisement, a club where records 
are spun is singles-oriented, and in that context, 
relentless, rhythmic sound may work; the dance 
exercise is also stress-relieving and very exciting— 
though I don’t know how single people can meet if 
they can’t hear each other over the music! But 
beyond the singles and party environments, a 
musical record álbum is not an advertisement for 
itself. it’s (hopefully) a work of art. Fortunately 
there’s a large crossover where music which is 
suitable for dancing also makes an enjoyable sit- 
¿own listening experience. But what enriches the 
át-down experience is a well-programmed álbum 
with artfully-used dynamic range, fast and slow 
numbers, loud and soft pieces, which exercises our 
senses and may relieve stress more than relentless 
banging for an hour. 

The problem is that dynamic range in pop 
music has become an increasingly rare 
phenomenon, due to the fruitless volume wars and 
pressure írom A&R to make a record that can get 
through the noise-to-signal ratio of restaurants, 
record-store kiosks, car-play, etc. Years ago, music 
in cars was heard only from the already-compressed 
radio, and at home we listened to record albums. But 
today’s public listens to CDs in the noisy car, and at 
home does more casual and background listening 
than before, so the number of critical listeners is a 
smaller part of the total audience. Some of the 
public has gotten lazy and expects their CD changer 
to perform like a radio, keepinga constant loudness 
with each CD. This makes some anxious record 


producers ask for compression to the extent where 
sonic quality is damaged. Ironically, new sound 
palettes (such as shred) have been discovered out of 
the distorted processes we use to make things 
louder, but let’s hope not all music has to go this 
way! The answer is EDUCATION... 

We need to edúcate producers that fatiguing. 
hypercompressed CDs will not be auditioned more 
than once—the record loses critical word-of-mouth 
advertising. Teach them that a decent amount of 
dynamic range helps make an álbum more 
enjoyable, lively, even clearer in most cases, and 
that sound quality suffers as the average level goes 
up. Teach them that hypercompression is 
incompatible with radio and lossv (MP 3 ) encoding 
(see Chapter 10, Appendix i). Ironically, the 
recordings which sound loudest and most 
impacting on the radio are usually the ones which 
have the lowest absolute CD loudness. 

We need to edúcate the public that it is normal 
to adjust the volume control from CD to CD, and to 
turn it up and down in a noisy car. Teach them to use 
the compressor button if they’re annoyed by riding 
levels. Unfortunately, the disappearance of the 
audio cassette has removed our opportnnity to 
release in two formats, one with reduced dynamic 
range; all the new formats have tremendous 
dynamic range capability. So now we have to turn to 
Solutions in the consumer equipment, including 
metadata (Chapter 15), if it ever catches on. 

We need to edúcate car audio equipment 
manufacturera that recordings are and should be 
variable in their levels, so a compressor should be 


265 Education 


an essential part of every noisy car’s system. Some 
sophisticated cars have automatic level Controls tied 
with the speed and ambient level, which is a 
tremendous engineering advance. When most cars 
liave lilis equipment, Ihere will be less producer 
demand to overcompress material. 

We need to edúcate home audio equipment 
manufacturera that all CD and DVD changers 
should have compressor buttons. Cali it the party 
button.As we move into media that accept metadata, 
such as DVD and DVD -A, manufaeturers should 
inelude ergonomic Controls that look at dialnorm 
levels (see Chapter 15), permitting casual listeners 
to switch media without 
riding the volume 
control. 

We need to 

edúcate new inastering 
engineers by teaching 
them to study the great- 
soundingpop recordings of yesteryear. Many of 
today’s hypercompressed recordings sound worse 
than 6o’s and 70’s analog recordings and have much 
less dynamic range. Yet the older pop recordings 
play well everywhere, again illustrating that 
hypercompression is unnecessary. 

The Curefor Hypercompression: How Loud Should I 
Make It? 

Not all producers and engineers will master the 
concepts of the K-System (Chapter 15), but I ask 
mastering engineers and producers to please 
consider How Loud Should I Make It? An acceptahle 
answer could be: Tum it up until it sounds bad and 
then back it offby several dB. 


Actually, the hest way to "win” the loudness race 
is to be far from first place. Be prepared to be at 
least 3 dB lower than "the winner” if you want your 
record to even sound acceptahle! And considerably 
lower if you are looking for an upen, alear , 
dimensional sound. 

I have never lost a job by suggestingto a 
producer that I have already mastered a record as 
hot as it should be; most producers appreciate the 
advice of an experienced professional. If he prefers 
dil'ferently, then of course I tura it up, forthe 
customer is always right. But mastering engineers 
should gain the producer’s confidence; it’s often 

useful to 
demónstrate the 
sonic deterioration 
if a recording is 
turnedup any 
further. Then on the 
next record you do 
together, he will 
(hopefully) accept your word that the record is 
mastered as hot as it should be. This bit of education 
and effort is one sure way that we can combat the 
sound-ruining loudness war. 


Be prepared to be at least 3 dB 
lo wer than the carretil " winner ” if 
you want your record to even sound 
acceptahle!" 


Chapter ai 266 


PART V: APPENDICES 


[ Appendix 1 ] 

Radio Ready: The Truth 


I. Introduction 

Radio, like all technology, is constantly 
changing. Digital radio will eventually change the 
way that our records sound, and we now have to 
contend with low-fidelity Internet radio. But for the 
immediate future, most of our recordings will be 
reproduced on standard analog FM radio. Have you 
ever wondered what happens to your recording 
when it is played on the radio? Ever wondered how' 
to get the most out of radio play? I am pleased to 
introduce the guest authors who have largely written 
this section — Bob Orban and Frank Foti.’ Both of 
them are considered to be the world’s authority on 
radio processing. Bob is the engineerand designer 
ofthe Optimod line of audio processors, while 
Frank, who has an extensive radio engineering 
background, is the creator and lead designer of the 
Omnia product line. Together, their products are 
used by nearly eveiy radio station around the world. 


which was then downloaded by a great number oí 
mastering engineers, mastered, and uploaded back 
to Tardón, who then made a two-CD collection 
called "What Is Hot?” The absolute loudness of the 
cuts on this compilation ranges from extremely hot 
and highly distorted (monitor turned down to about 
—14 ref. RP 300) to very light (monitor position 
about —5), a loudness difference of 9 dB! 

After "What is Hot?” carne out, the Webboard 
participants felt that it would be important to 
demónstrate what happens to these cuts when 
passed through radio processing. Enter Bob Orban, 
who volunteeredto process the music with typical 
radio station presets. Tardón then produced a 
compilation CD comparingthe songs before and 
after radio processing. 

This next figure, courtesy of Tardón, is a 
comparison of several sample mastered cuts before 
and after Orban processing. 


I n 3000 , participants in the 
Mastering Webboard engaged in a 
friendly collaboration to find out 
what range of levels we are using. 

Engineer Tardón Feathered of San 

Francisco put a rock and roll mix on his FTP site 



Robcrt Orban. Orban ínc. CACRLCompany). Frank Foti, Omnia Audio 


At top, five mastered cuts of the 
same music, with increasing 
loudness and visual density. At 
bottom, the same cuts passed 
through the Orban radio 
processor . 1 


2 ?¡ 












Notice that regardless of the original level, after 
radio processing every source cut ends up with 
similar apparent density: soft passages are raised 
radically, and loud passages slammed to a máximum 
limit. I’ve auditioned this revealing comparison CD: 
Every track ends up at the same loudness, proving 
beyond a shadow of a doubt that there is no 
advantage to extreme compression in mastering 
when a cut ends up on the radio. I also observed that 
the radio processing severely distorted just about all 
the origináis, except for the softest track, which 
carne in at about a K-14. The rightmost and most 
squashed source track was unlistenable after it was 
processed through the Orban. The radio processing 
also somewhat randomizes the stereo image and 
lowers the high end, but listening revealed that 
adding severe highs in mastering only aggravated 
the distortion-, it did not help the clarity of the final 
product. T,et’.s hearwhat Bob Orban and Frank Foti 
have to say about what's inside the box... 

II. What Happens to My Recording When 
¡t’s Played on the Radio? by Robert 
Orban and Frank Foti' 

Fewpeople in the record industry really know 
how a radio station processes their material before 
it hits the FM airwaves. This article’s purpose is to 
remove the many myths and misconceptions 
surrounding this arcane art. 

Every radio station uses a transmission audio 
processor in front of its transmitter. The 
processor’s most important function is to control 


* Edited and adapted from a 2001AES preaentation. 


the peak modulation of the transmitter to the legal 
requirements of the regulatoiy body in each 
station's nation. However, very few stations use a 
simple peaklimiter for this function. Instead, they 
use more complex audio chains. These can 
accurately constrain peak modulation while signifi- 
cantly decreasing the peak-to-average ratio of the 
audio. This makes the station sound louder within 
the allowable peak modulation. 

Garbage In—Garbage Out 

Manufacturers have tuned broadcast processors 
to process the clean, dynamic program material that 
the recording industiy has typically released 
throughout itshistory. (The only significant 
exception that comes to mind is 45-rpm singles, 
which often were overtly distorted.) Because these 
processors have to process speech, commercials, 
and oldies in addition to current material, they can’t 
be tuned exclusively for ’hypercompressed,” 
distorted CDs. Indeed, experience has shown that 
there’s no way to tune them successfully for material 
which has arrived so degraded. 

For 20 years, broadcast processor designers 
have known that achieving highest loudness 
consistent with máximum punch and cleanliness 
requires extremely clean source material. For more 
than 20 years, Orban has published application 
notes to help broadcast engineers clean up their 
signal paths. These notes emphasize that any 
clipping in the path before the processor will cause 
subtle degradation that the processor will often 
exaggerate severely. The notes promote adequate 
headroom and low distortion ampliíication to 


Appendix 1 272 



prevent clipping even when an operator drives the 
meters into the red. 

About 1997, we started to notice CDs arriving at 
radio stations that had been pre-distorted in 
production or masteringto increase their loudness. 
For the first time, we started seeing frequently 
recurring fíat topping caused by brute-forcé 
clipping in the production process. Broadcast 
processors react to pre-distorted CDs exactly the 
same way as they have reacted to accidentally 
clipped material for more than 30 years—they 
exaggerate the distortion. Because of phase rotation, 
the source clipping never increases on-air 
loudness—it just adds grunge. 

The authors understand the reasoning behind 
the CD loudness wars. Just as radio stations wish to 
offer the loudest signal on the dial, it is evident that 
recordingartists, producers, and even some record 
lahels want to have a loud product that stands out 
against its competition in a CD changer or a music 
store's listening station. 

In radio broadcasting this competition has 
existed since about 1975,* when radio stations used 
simple clipping to get louder, and this technique has 
now migrated to the music industry. The figure at 
right shows a section of a severely clipped waveform 
from a contemporary CD. 

The area marked between the two pointers 
highlights the clipped portion. This is one of the 
roots of the problem as described in this paper; the 
other is excessive digital Jimiting that does not 
necessarily cause flat-topping, but still removes 
transient punch and impact from the sound. 


The problem today is that we now have sophis- 
ticated and powerful audio processing for the 
broadcast transmission system and this processing 
does not coexist well with a signal that has already 
been severely clipped. Unfortunately, withcurrent 
pop CDs, the example shown above is more the 
norm than the exception. 


The attackand release characteristics of 
broadcast multiband compression were tuned to 
sound natural with source material havingshort- 
term peak-to-average ratios typical of vinyl or 
pre-1990 CDs. Excessive digital limitingofthe 
source material radically reduces this short-term 
peak-to-average ratio andpresents the broadcast 
processor with a new, synthetic type of source that 
the broadcast processor handles less gracefully and 
naturally than it handles older material. Instead of 
being punchy, the on-air sound produced from 
these hypercompressed sources is small and Hat, 
without the dynamic contours that give music its 
dramatic impact. The on-air sound resembles 
musical wallpaper and makes the listener want to 
turn down the volume control to background levels. 



A severely-cCpped waveform 
from a contemporary CD. 


Radio Ready 



There is a myth that broadcast processing will 
affect hvpercompressed material less than it will 
more naturally produced material. This is true in 
only one aspect—if there is no long-term dynamic 
range coming in. then the broadcast processor’s 
AGC* will not further reduce it. However, the 
broadcast processor w r ill still opérate on the short- 
term envelopes of hypercompressed material and 
will further reduce the peak-to-average ratio, 
degradingthe sound even more. 

Hypercompressed material does not sound 
louder on the air. It sounds more distorted, making 
the radio sound broken in extreme cases. It sounds 
small, busy. and fíat. It does not feel good to the 
listener when turned up, so he or she hears it as 
background music. Hypercompression, when 
combined with "major-market” levels of broadcast 
processing, sucks the drama and life from music. In 
more extreme cases, it sounds overtly distorted and 
is likely to cause tune-outs by adults, particularly 
women. 

ATypical Processing Chain—What Really Goes On 
When your Recording is Broadcast 

Atypical chain consists of the following 
elements, in the order that they appear in the chain: 

Phase rotator. The phase rotator is a chain of 
allpass filters (typically four poles, all at 200Hz) 
whose group delay is very non-constant as a function 
of frequency. Many voice waveforms (particularly 
male voices) exhibit as much as 6dB asymnxetiy. The 
phase rotator makes voice waveforms more 
symmetrical and can sometimes reduce the peak- 

• Automatic Gain Control. A typc of compression that brings up low-level passagcs. 

Sce Chapter n. 


to-average ratio of voice by 3~4,dB. Because this 
processing is linear (it adds no new frequencies to 
the spectrum, so it doesn’t sound raspy or fuzzy) it’s 
the closest thing to a "free lunch” that one gets in 
the world of transmission processing. 

There are a few prices to pay. In the good oíd 
days when source material wasn't grossly clipped, 
the main price was a very subtle reduction in 
transparency and definition in music. This was 
widely accepted as a valid trade-off to achieve greatly 
reduced speech distortion. because the phase 
rotator’s effects on music are unlikely to be heard on 
typical consumer radios, like car radios, boomboxes, 
"Walkman”-style portables, and table radios. 

However, with the rise of the clipped CD, things 
have changed. The phase rotator radically changes 
the shape of its input waveform without changing its 
frequency balance: If you measured the frequency 
response of the phase rotator, it would measure 
"fíat” unless you also measured phase response, in 
which case you would say that the "magnitude 
response” was fíat and the phase response was 
highly non-linear with frequency. The practical 
effect of this non-linear phase response is that fíat 
tops in the original signal can end up anywkere in 
the waveform after processing. It’s common to see 
them go right through a zero Crossing. They end up 
looking like little smooth sections of the waveform 
where all the detail is missing—a bit like a scar from a 
severe bum. This is an apt metaphor for their audible 
effect, because they no longer help reduce the peak- 
to-average ratio of the waveform. Instead, their only 
effect is to add unnecessary grungy distortion. 


Appendix 1 274 



There has been a myth in the recording world 
that broadcast processing will modify these clipped, 
cver-compressed CDs less than it will modify clean, 
dynamic CDs. Thanks in part to phase rotation, this 
contention is absolutely false. In particular, any 
clipping in the source material causes nothing 
but added distortion without increasing on-air 
loudness at all. 


AGC. The next stage is usually an average- 
responding AGC. By recording studio standards, 
this AGC is required to opérate overa veiy wide 
dynamic range—typically in the range of 25dB. Its 
function is to compénsate for operator errors (in 
live production environments) and for varying 
average levels (in automated environments). 
Average levels vary maiidy beeause the peak lo 
average ratio of CDs themselves has varied so much 
fromabout 1990 on. Therefore, normalizing hard 
disk recordings (to use all available headroom) has 
the undesirable side effect of causing gross 
variations in average levels. Indeed, i:i transfers 
(which are also common) will also exhibit this 
variation, which can be as large as i5dB!" 


The price to be paid is simple: the AGC will 
eliminate long-term dynamics in your recording. 
Virtualfy all radio station program directors want 
their stations to stay loud always, eliminating the risk 
that someone tuning the radio to their station will 
either miss the station completely or will think that 
it’s weak and can’t be received satisfactorily. Radio 
people often cali this effect "dropping off the dial.” 

AGCs can be either single-band or multiband. 

If they are multiband, it’s rare to use more than 


• No wonder CD changers are a predicament. See Chapter 15 [BK 1 . 


two bands beeause AGCs opérate slowly, so "spectral 
gain intermodulation” (sueh as bass' pumpingthe 
midrange) is not as big a potential problem as it is 
for later compression stages, which opérate more 
quickly. 

AGCs are always gated in competent processors. 
This means that their gain essentially freezes if the 
input drops below a preset threshold, preventing 
noise suck-up despite the large amount of gain 
reduction. 

Stereo Enhancement. Not all processors 
implement stereo enhancement, and those that do 
may implement it somewhere other than after the 
AGC. (In fact, stand-alone stereo enhancers are 
often placed inthe program line in front of the 
transmission processor.) 

The common purpose of stereo enhancement is 
to make the signal stand out dramatically when the 
car radio listener punches the tuning button. It’s a 
technique to make the sound bigger and more 
dramatic. Overdone, it can remíxthe recording. 
Assumingthat stereo reverb, with considerable L—R 
energy, was used in the original mix, stereo 
enhancement, for example, can change the amount 
of reverb applied to a center-channel vocalist. The 
moral? When mixing for broadcast, err on the “diy” 
side, beeause some stations’ processors will bring 
the reverb more to the foreground. * 

Beeause each inanufacturer uses a different 
technique for stereo enhancement, it’s impossible 
to generalize about it. The only universal constraints 
are the need for strict mono compatibilify (beeause 
FM radio is frequently received in mono, even on 


375 Radio Ready 



"stereo” radios, due to signal-quality-trigged mono 
blend circuitry), and the requirement that ihe 
stereo difference signal (L-R) not be enhanced 
excessively. Excessive enhancement always 
increases multipath distortion (because the part of 
the FM stereo signal that carries the L-R 
information is more vulnerable to multipath). 
Excessive enhancement will also reduce the 
loudness of the transmission (because of the 
"interleaving” properties of the FM stereo composite 
waveform, which we won’t further discuss). 

These constraints mean that recording-studio- 
style stereo enhancement is often incompatible with 
FM broadcast, particularly if it significantly 
increases average L-R levels. In the days ofvinyl, a 
similar constraint existed because of the need to 
prevent the cutter head from lifting off the lacquer, 
but with CDs, this constraint no longer exists. 
Nevertheless, any mix intended for airplay willyield 
the lowest distortion and highest loudness at the 
receiver if its L—R/L+R ratio is low. Ironically, mono 
is loudest and cleanest! 

Equalization. Equalization may be as simple as 
a fixed-frequency bass boost, or as complex as a 
multi-stage parametric equalizer. EQ has two 
purposes in a broadcast processor. The first is to 
establish a signature for a given radio station that 
brands the station by creating a "house sound.’’ The 
second purpose is to compénsate for the frequency 
contouring caused by the subsequent multiband 
dynamics processing and high frequency limiting. 
These may create an overall spectral coloration that 
can be corrected or augmented by carefully chosen 
fixed EQ before the multiband dynamics stages. 


Multiband Compression and Limiting. 

Depending on the manufacturer, this may occur in 
one or two stages. If it occurs in two stages, the 
multiband compressor and limiter can have 
different crossovers and even different numbers of 
bands. If it occurs in one stage, the compressor and 
limiter functions can "talk” to each other, 
optimizingtheir interaction. Both design 
approaches can yield good sound and each has its 
own set of tradeoffs. 

Usually using anywhere between four and six 
bands, the multiband compressor/limiter reduces 
dynamic range and increases audio density to 
achieve competitive loudness and dial impact. It's 
common for each band to be gated at low levels to 
prevent noise rush-up, and manufacturers often 
have proprietary algorithms for doing this w r hile 
minimizing the audible side effects of the gating. 

An advanced processor may have dozens of 
setup Controls to tune just the multiband 
compressor/limiter. Drive and output gain Controls 
for the various compressors, attack and release time 
Controls, thresholds, and sometimes crossover 
frequencies are adjustable, depending on the 
processor design. Each of these Controls has its own 
effect on the sound, and an operator needs 
extensive experience if he or she is to tune a 
broadcast multiband compressor so that it sounds 
good on a wide variety of program material without 
constant readjustment. Unlike masteringin the 
record industry, in broadcast there’s no mastering 
engineer available to optimize the processing for 
each new source! 


Appendix i 376 


Pre-Emphasis and HF Limiting. FM radio is 
pre-emphasized at 50 microseconds or 75 
microseconds, depending on the countiy in which 
the transmission occurs. Pre-emphasis is a 
6dB/octave high frequency boost that's 3 dB up at 
2.1 kHz(75ps) or 3.2 kHz (5ops). With 75ps pre- 
emphasis, 15 kHz is up tydB! 

Depending on the processor’s manufacturer, 
pre-emphasis may be applied before or after the 
multiband compressor/limiter. The important thing 
for mixers and mastering engineers to understand 
isthat puttinglots of energy above 5 kHz creates 
significan! problems for any broadcast processor 
because the pre-emphasis will greatly increase this 
energy. To prevent loudness loss, the processor 
applies high frequency limiting to these boosted 
high frequencies. HF limiting may cause the sound 
tobecome dull, distorted, or both, in various 
combinations. One of the most important 
differences between competing processors is how 
effectively a given processor performs HF limiting 
tominimize audibleside effects. Instate-of-the-art 
processors. HF limiting is usually performed 
partially by HF gain reduction and partially by 
distortion-cancelled clipping. 

Clipping. In most processors, the clipping 
stage is the primary means of peak limiting. It’s 
crucial to broadcast processor performance. 

Because of the FM pre-emphasis, simple clipping 
doesn’t work well at all. It produces difference- 
frequency IM distonion, which the de-emphasis in 
the radio then exaggerates. (The de-emphasis is fíat 
below 2-3 kHz, but rolls off at 6dB/octave 


thereafter, effectively exaggeratingenergy below 2-3 
kHz.) The result is particularly offensive on cymbals 
and sibilance ("essses” become "efffs”). 

In the late seventies, one of the authors of this 
arricie (R.O.) invented distortion-cancelled 
clipping. This manipulates the distortion spectrum 
added by the clipper’s action. In FM, it typically 
removes the clipper-induced distortion below 2 kHz 
(the fíat part of the receiver’s frequency response). 
This typically adds about ídB to the peak level 
emerging from the Clipper, but, in exchange, allows 
the Clipper to be driven much harder than would 
ntherwise be possible. 

Provided that it doesn’t introduce audibly 
offensive distortion, distortion-cancelled clipping 
is a very effective means of peak limiting because it 
affects only the peaks that actually exceed the 
clipping threshold and not surrounding material. 
Accordingly, clipping does not cause pumping, 
which gain reduction can do, particularly when gain 
reduction operates on pre-emphasized material. 
Clipping also causes minimal HF loss by 
comparison to HF limiting that uses gain reduction. 
For these reasons, most FM broadcast processors 
use the máximum practical amount of clipping that’s 
consistent with acceptably low audible distortion. 

Real-world clipping systems can get veiy 
complicated because of the requirement to strictly 
band limit the clippcd signal to lcss than 19 kHz 
despite the harmonios that clipping adds to the 
signal. (Bandlimiringprevents aliasing between the 
stereo main and subehannel, proteets subearriers 
located above 55 kHz in the FM stereo composite 


277 Radio Ready 


baseband, and protects the stereo pilot tone at 19 
kHz). Linearly filteringthe clipped signal to remove 
energy above 15 kHz causes large overshoots (up to 
6dB in worst case) bccausc of a combination of 
spectral truncation and time dispersión in the filter. 
Even a phase-linear lo wpass filter (practical only in 
DSP realizations) causes up to s>dB overshoot. 
Therefore. state-of-the-art processors use complex 
overshoot compensation schemes to reduce peaks 
without significantiy addingout-of-band spectrum. 

Some chains also apply composite clipping or 
limitingto the output of the stereo encoder, which 
encodes the left and right channels into the 
multiplex signal that drives the transmitter. Its 
actually the peak level of this signal that govermnent 
broadcasting authorities regúlate. Composite 
clipping or limiting has long been a controversial 
technique, but the latest generation of composite 
clippers or limiters has greatly reduced interference 
problems characteristic of earlier technology. 

Conclusions 

Broadcast processing is complex and sophis- 
ticated, and was tuned for the recordings produced 
using practices typical of the recording industry 
during almost all of its histoiy. In this historical 
context, hypercompression is a short-term anomaly 
and does not coexist well with the "competitive” 
processing that most pop-music radio stations use. 
We therefore recommend that record companies 
provide broadcasters with radio mixes. These can 
have all of the equalization, slow compression, and 
other effects that producers and mastering ngineers 
use artistically to achieve a desired "sound.” What 


these radio mixes should not have is fast digital 
limiting and clipping. Leave the short-term 
envelopes unsquashed. Let the broadcast 
processor do its work. The result will be just as 
loud on-air as hypercompressed material, but 
will have far more punch, clarity, and lile. 

A second recommendation to the record 
industry is to employ studio or mastering 
processing that provides the desired sonic effect, 
but without the undesired extreme distortion from 
clipping. The alternative to brute-force clipping is 
digital look-ahead limiting, which is already widely 
available to the recording industry from a number of 
difieren: manufacturers (includingthe authors’ 
companies). This processing creates lower 
modulatíon distortion and avoids blatant flat- 
topping of waveforms, so is substantially more 
compatible with broadcast processing. 

Nevertheless, even digital limiting can have a 
deleterious effect on sound quality by reducing the 
peak-to-average ratio of the signal to the point that 
the broadcast processor responds to it in an 
unnatural way, so it should be used conservatively. 
Ultimately, the only way to tell how one’s production 
processing will interact with a broadcast processor 
is to actually apply the processed signal to a real- 
world broadcast processor and to listen to its output, 
preferably through a typical consumer radio. 

1 These tracks were ordcred according to incrcasing loudness using the Waves PAZ 
meter. However, the apparent wavcform density implies sample * *2 is louder than 

* 3 . Neither measurement method is perfect. 

2 Bob Ludwigdn correspondence) menticns that competition in radio 
broadcasting was already happening in the late 19608. noting WABC "color radio" 
added EMT píate to eveiythingto increase average density. 

3 BK: On the other hand, the other radio processing. especially the compression. 
reduces the sense of depth, plus, typical reception areas tend to lose separation so. 
improving the stereo image in mastering may not be such a bad thing. 


Appendix 1 37# 



[ Appendix 2 ] 

The Tower of Babel 
Audio File Formats 


Platforms, Extensions and Resource 
Forks 

Macintosh files are divided into two parts, the 
dala fork (which is the iriain parí and whieh is 
transferable to a PC), and resource fork. Most 
Macintosh programs look for the file type in the 
resource fork, unique to Macintosh computers. The 
resource fork is the Macintosh way of telling 
programs who created a file, its file type, and 
additional information proprietary to the particular 
file type; it is analogous to the three letter extensión 
onthe PC (e.g. .aif, .wav). These were invented to 
allowusers to double-click on a file and automat- 
ically open a program, an advance over the DOS 
command line. I don’t know whether the Windows 
or Mac approach is better, because both can cause 
serious headaches when things go wrong. Resource 
forks cannot be transmitted over the Internet 
(except with Mac-specific compression Utilities), 
and can only be transferred between platforms in a 
limited manner. So on the Mac, if the resource fork 
is empty (e.g., if the file carne from a PC) or has an 
error in it, then a simple four-letter variable may be 
all that’s keeping the audio from playing. More 
advanced programs, such as Barbabatch and 
Soundhack on the Mac, ignore the resource fork 
and look inside the data fork of the file for the 


header, which contains far more information, 
includingthe file type, wordlength, and sample 
rate. If a Mac program restricted to reading the 
resource fork does not recognize the file type, try 
using a file-typingprogram. Replace the incorrect 
valué with the letters AIFF, WAV (sometimes 
WAVE), or BWF. But tura down your monitor gain 
bcforc playing in case you chosc the wrong onc! 

When transferring files between platforms, the 
WAV, AIFF and BWF file lypes (described below) 
are the most universal, because they do not depend 
on resource forks for anything except file type, and 
the file type is also duplicated within the Header (in 
the data fork) if the resource fork is missing. We 
often receive files on Macintosh-formatted CD or 
DVD-ROMs, and these may be read ona PC using a 
simple system addition such as MacOpener. 
MacOpener readsthe resource fork on the CD- 
ROM and uses a table (user-configurable) that 
automatically supplies an appropriate file 
extensión; you can tell the procese is working 
because Windows will supply the icón for that file 
type. I do not know of a way to mount a Mac-format 
hard disc on Windows and read the resource fork. 
However, it is a blessing that the SADiE (through 
ver. 4) proprietary SCSI bus can read and write to 
all common audio formats as well as Mac (including 


2 79 


resource fork) and PC-formatted hard dises. In fací, 
SADiE can freely intermix file formats and 
wordlengths within its EDL, also a blessing. Sonic 
Solutions has historically been a closed platform, 
but Sonic Solutions HD 1.7 can read AIFFs, and 16- 
bit (not 24) WAVs; the only format it can write is 
AIFF. This necessitates frequent use of a universal 
conversión application such as Barbabatch on the 
Mac to exchange files between Sonic and the rest of 
the world. Barbabatch also performs excellent 
sample rate and wordlength conversión as well as 
batch renaming and splitting regions within files if 
desired, and acceptable dithering. 

SADiE identifies the file type by the file 
extensión on PC-formatted dises, and the resource 
fork on Mac-formatted dises. If a file somehow 
arrives on a PC with no extensión," try adding the 
extensión, but turn down your monitor gain before 
playing! When in doubt as to the type, tiy adding the 
extensión .raw or .peni and tell a program which can 
read raw files (such as Wavelab) the suspected 
wordlength and sample rate and attempt to play the 
raw file. From there it may be transferred via 
AES/EBU into SADiE, for example. But, again, watch 
out for full scale white noise if you guessed wrong! 
Conversely, if you add an extensión to a Mac file 
while it is on the Mac (or accidentily use a . 
character in any Mac file ñame), when it eventually 
gets to SADiE it may end up with an extra extensión 
to its ñame, or SADiE will get confused as to the file 
type, or the PQ list may say Love Me Do.aif instead of 
just Love Me Do. The lesson is not to add extensions 
to Mac-formatted dises and let the smart Utilities do 
their thing. 

• In Windows, tura on the option which lets you view the extensions. 


File Formats-non Lossy 

There are four popular audio file formats in 
current use: AIFF, WAVE, BWF and Sound 
Designerll (SD2). 

AIFF 

Audio Interchange File Format supports 
standard bit resolutions in múltiples of 8, up to 32 
bits fixed point, although most AIFFs are 16-24 bits. 
While most professional PC programs can read and 
write AIFFs, this format was created for use on 
Macintosh computers. A mono or split AIFF 
contains one channel, as opposed to interleaved 
AIFFs, which can contain múltiple channels. We 
prefer to receive interleaved files wherever 
possible, because it is easier to group them and 
prevent interchannel time-slippage. There is 
reportedly a floating-point AIFF file type, but as of 
this writing, the high-end mastering programs 
interchanging data insist on fixed-point notation. 
Sample rates up to 192 kHz and beyond are 
supported. On the PC, the standard extensión is ,aif. 
Data is stored in chunks, and manufacturers can 
write proprietary chunks. Byte order is big-Endian 
(msb) first, which is the Motorola standard, as 
opposed to little-Endian (lsb), the Intel standard. 
If a program misreads the wrong end, the result will 
be nearly full-scale white noise, a not uncommon 
result when exchangingfiles betwee.n platforms. 
Reversingthe ends wastes one instruction eyele, so 
manufacturers are often a bit fussy about which file 
format they prefer. There is no official provisión for 
time-stamping except in a proprietary 
manufacturer’s chunk. 


Appendix 2 280 



Avariation of AIFF is called AIFC (short for 
AIFF-C), which employs optional lossy data 
reduction (coding) and can use floating point 
notation. I have not seen AIFC supported by a high- 
end mastering program, but I have seen the AIFC 
file type accidentally applied to a plain AIFF by Mac 
programs such as Quicktime. 

WAVE 

The WAVE file format, developed by Microsoft, 
is probably the most popular audio format, using a 
standard extensión of .wav. It supports a variety of 
bit resolutions (both fixed and floating point), 
sample rates, and channels of audio. As with AIFF, 
Wive files can be split or interleaved. There is 
provisión for time-stamping in one of the standard 
SMPTE timeeode formáis, supported by some PC 
and Mac programs, but the BWF (see below) is more 
reliable inthat respect. I recommend saving files as 
fixed-point (integer) WAVEs, as they are the most 
compatible between platforms. Understandably, 
many programs have difficulty with the several 
esoteric varieties of floating-point WAVEs. Byte 
order is little-Endian (lsb) first, most appropriate 
for Intel-based processors. 

As in the AIFF, data in WAVEs is stored in 
chunks, which can also be manufacturer-specific. 
Tlie format has grown in a somewhat disorganized 
manner, and now supports many variant and 
sometimes unstandardized types of chunks. But the 
high-end programs seem to be successful ignoring 
the chunks that don’t make sense to them! WAVE 
data may optionally be coded (psychoacoustic 
wordlength reduction, sometimes confusingly 


called data compression ), though mastering 
engineers expect that all files sent for mastering are 
linear PCM (i.e. uncoded, high-resolution). 

BWF 

The Broadcast Wave format is based on the 
Microsoft WAVE file format and continúes to use the 
WAV file ñame extensión. The EBU has added a 
"broadcast wave extensión” chunk to the basic wave 
format, which contains the minimum information 
that is considered necessary for all broadcast 
applications, such as unique source identifiers, 
origination station data, etc. The EBU has legislated 
this format to be a standard of interchange, so most 
high-end mastering programs will be requiredto 
support it, and its built-in timestamp will be 
welcome. Files may be linear or (lossy) coded via 
MPEG-i or —2. As of this writing, there is no 
provisión for linear multichannel, so BWF 
multichannel (greater than 2) files must be lossy- 
coded. Of course, you can send múltiple mono 
linear-signal-format BWF's or stereo pairs. 

Sound Designer II 

SDII format was invented by Digidesign for use 
on the Mac. SD II (or SD2) files are landmines on 
the PC, particularly because of their rebanee on 
resource forks, where file type, sample rate, 
wordlength and time-stamp information are kept. 
SD2S can eitherbe múltiple-channel mono, or 
dual-channel interleaved stereo. Sample rates up to 
48 kHz are officially supported by Digidesign, 
although Mark of the Unicom (MOTU) uses SDII 
files exclusively. up to 96 kHz, in the Macintosh 
program Digital Performer3.o. Performer can 


281 File Formats 


import and export WAV and AIFF but unfortunately 
cannot use those file types within an EDL. SD2 can 
be written and read from within Pro Tools, but only 
up to 48 kHz as far as I know. SADiE 4.5 has a bug 
which does not permit reading interleaved SD2S, 
and since PC-formatted backup tapes cannot store 
Mac resource forks, there is no way to archive an 
SD2 session from within SADiE except by bouncing 
first to a new format. So, routinely, I convert all SD2 
files to AIFF or WAV using Barbabateh on the Mac, 
and move the removable disc over to SADiE for 
mastering. Reportedly, SD2 has been officially 
obsoleted by Digidesign, but its memory lingers on! 

Length Limitations 

A major problcm with both WAVE and AIFF file 
formats is that the chunk sizes (including the overall 
chunk describing the whole file) use 32 - bit 
integers, holding the size in bytes. For a quad 24 -bit 
file at 96 kHz SR, the longest possible duration is 
some3728 seconds, soyou get onlyjust overan 
hour. Go all the way to 5.1 surround at 192KI1Z, 

24bit, and the limit descendsto some 20 minutes.’ 
Short of inventing a new file format that can support 
longer length files, the solution is to use split files if 
interleaved format proves too long for the length to 
be correctly specified. 

Metafile Formats 

Metafile formats are designed to interchange all 
the information needed to reconstruct a project. 
Unfortunately, some manufacturers are reluctant to 
adopt another’s format, so this valuablc cffort has 
not made enough progress. 


Appendix 2 2 82 


AES-31 

The AES- 3 i file Interchange standard was 
developed by the SC-06-01AES standards 
committee jointly with several manufacturers. The 
goal is to interchange basic projeets, timestamp and 
crossfade information as well as audio files. There 
has been some success but as of this writing the 
format is not supported by Digidesign. 

OMF 

The Open Media Format was produced by 
Digidesign to interchange Pro Tools Session and 
audio data with other workstations. At this writing, 
it is in a primitive state. The last time I tried to 
import OMF data into Digital Performer I got a fatal 
error. 

Lossy File Formats 

MP 3 and ATRAC (used on the Minidisc) are 
lossy file formats, that is, some audio information 
has been sacrificed in the effort to save space and 
increase transmission speed. Once sound has been 
encoded into MP 3 , sound quality can never be 
restored, which is why it’s a lossy format! Since these 
have become widespread and mislabeled CD Quality 
we sometimes get them as original sources! This 
violates the source-quality rule. Whenever possible. 
we ask to have these replaced with higher- quality. 
earlier-generation sources, or the sound quality will 
obviously suffer, especially after mastering 
processing. 


Richard Dobson. as reported in the Surround Sound maillist. 



[ Appendix 3 ] 

Preparing Tapes and Files for Mastering 


One major theme in this book has been the 
mastering engineer’s comprehensive attention to 
sequencing, spacing (aka assembly ), leveling, clean- 
up and processing. The better-prepared your tape 
or file, the better we all will look. Make the best mix 
you can, then let the mastering engineer do the rest 
of the magic, includingthe "heads, tails, fade-ins 
and fadeouts," for if something is cut off or faded 
prematurely, it will be lost. Don't be tempted to fade 
even if there is a noise, because we have some tricks 
that can create real-sounding endings on tunes that 
eveiyone thought had to be faded. as described in 
Chapter 7. You can also inelude a "fade example,” 
which we can use if this proves to be the best choice. 
Given freedom, the mastering engineer can produce 
a seamless, flowing record álbum from the "loose 
parts” sent by the mix engineer. Leaving the tunes 
loose also permits the mastering house the most 
flexibility to change the order oí the álbum (if 
necessary), or produce segues in the most artistic 
fashion. 

In the last century, the most common formats 
we received for music mastering were linear, e.g., 
analog and digital tape and standalone CDR (which 
is linear for writing, but random-access for 
reading). But now the most popular formats are 
eoinpletely randomaccess (file-based). Here’s how 


to make a mastering engineer happy when 
submitting finished mixes on the médium of your 
choice. 

Communication 

Mastering is a collaborative process, even if you 
cannot attend the mastering session; the mastering 
engineer’s job isto realizeyour desires and if 
possible to go beyond your wildest dreams! Give the 
mastering engineer a cali to discuss your music and 
whatyou think it needs. Get the mastering engineer 
involved early in the mixing process; if you work 
nearby, bring over a sample to hear on the high- 
resolution, wide-range masteringmonitors. Ask 
yourself: Does it sound like music? Does it live and 
breathe? Do the climaxes sound somewhat like 
climaxes? Do the chorases have a bit more energy 
than the verses (the usual natural case)? Is the bass 
dram to bass ratio right or do you have doubts? Is 
the sound as spacious and deep as you want it to be? 
Have you checked the material on several 
alternative systems? When it comes time for the 
mastering, don’t hesitate to provide or suggest a CD 
of similar music that appeals to you, yet leave your 
mind open to the creativity of the mastering 
engineer. After the mastering session is over, you 
can listen to a reference (CD) on your own playback 
sysLem and if desired, suggest revisions or 
improvements. 


288 


Logs 

The logs that accompany mixes are very 
important. Thorough logging is essential: it keeps a 
project from being delayed as we don’t have to chase 
down the catalog number or other essential 
information on the mastering day. Some engineers 
forget that a CD ROM has no order. 1 So all logs 
should indicate the full title of each song, the 
corresponding abbreviated file ñame on disc, and 
the order the song is to appear on the final médium, 
plusyour comments about fades, noises or anything 
that concerns you. Please see the example log in 
Appendix4. 

Stems, Splits and Altérnate Mixes (e.g. Vocal 
Up/Down). By all means provide altérnate mixes or 
synchronized stems if possible. See Chapter i 3 . 

Linear Media (DAT, Analog tape. Stand-alone CDR) 

Don’t bother to reorder DAT tapes or CDRs by 
copying, because the copy process may introduce 
more trouble than the time saved at the mastering 
house (if any). Leave the tunes out of order, leave 
the outtakes and altérnate mixes (which may prove 
useful), and mark all keeper takes. Don't bother to 
space the tunes on linear media other than leaving 
enough time to cue and to use leaders or program 
IDs to identify the cuts. 

When mixing to disc or digital tape, never make 
just one. Always record two at once digitally (make 
data-identical mixes labeled "A” and "B”), and hold 
onto that safety—never send the only copy in the 
mail. Record one or two minutes at the head of the 
tape with test tones or simply blank audio and begin 
the first tune after that with program ID # i. Start IDs 


Appendix 3 2S4 


do not have to be cxactly placed, but thcy guidc us to 
loadingthe proper tune. Rememberthat digital 
tapes need time to lock up—start recording on the 
mix tape, and for safety’s sake wait a full 10 seconds 
before runningthe multitrack (you can use the 
lockup time to lay down a verbal slate*). When 
writing to standalone CDRs, which lock up instantly, 
a second’s pause before the downbeat should be 
enough, but leave those critical breaths and noises 
in (see handles, below)! 

Tape to Tape Dubbingprocedures. Always 
monitor the output of the recorder while copying. If 
you must pause a tape-based recorder during the 
dubbing process, make sure to roll in record for at 
least 10 seconds before the tune begins, to prevent 
record glitches. This means that DAT tapes dubbed 
from other DATs can never have the short spacing 
we like on an álbum. Learn how absolute time is 
used on DATs, and maintain continuous ABS 
throughout the various mixing sessions by using 
end search before beginning the next session or 
after any playback. 

Level Check 

As described in Chapter 5, mix with conser- 
vative levels, which is not a problem with 24-bit 
media. Print the mix with levels well under the top 
and no OVERs! I recommend - 3 dBFS máximum. 
Roger Nichols reminds mix engineers using DAWs 
to visit each plug-in, reset the clip indicator and 
check the mix. If there’s a clip, then redo the mix to 
avoid internal clipping, which can cause pops and 
snaps that usually aren’t heard until mastering. 5 


• We still appreciate having verbal slates when dealing with non-füe-based media. 
A ilute is a verbal iileiilificatiuii uf the tille 01 takc iiuinbci. 



Preserve Data Integrity 

In general, send the earliest generation, 
unprocessed material to the mastering house—avoid 
copying or going to second-generation in a DAW. If 
you must edit, keep eveiything at unity gain if at all 
possible (do not normalize), even if the material is 
peaking low, as explained in Chapter 5. The same 
goes for temptation to equalize, compress, limit or 
otherwise process a mix al'ter it has been made. If 
you must, please send both versions to the 
mastering house, because we may be able to better 
the process with our tools, or combine it with other 
processes and reduce cumulative distortion. 

Máximum CD Program Length 

Eveiy plant specifies a máximum acceptable 
length, and some charge more for CDs over approx- 
imately 77 minutes. The final CD Master tape, 
includingsongs, spaces between songs, and 
reverberant decay at the ends of songs, must not 
exceed the limit, which at one popular plant is 
79 : 38 . The mastering engineer can determine the 
exact time afterthe master is assembled. DVD 
program lcngths vary bccause of the data coding and 
must be determined at the time of authoring. 

Labeling tapes or dises. Which is the Master? 

Don’t forget to put a ñame and phone number 
on the source media in case it gets separated from 
the documentation! A DAT is not a CD master, and 
neither is a mix CDR submitted for mastering. The 
sources for an álbum are NOT the master; the álbum 
(production) MASTER is the final, PQ'd, equalized, 
edited, assembled, and prepared tape or disc that 

• NARAS han procJueed Maetcr Recording Delivery Recommendatione. Seriour. 
rccordists must study httpP/grammy.aol.com/recommendations.pdf 


needs no further audio work, and is ready for 
replication. 3 Please label the source media: 

Submaster or Work Tape, or Mix, or Final Mix, or 
SessionTape, orEdited-Mix, or Compiled-mix. or 
Equalized Mix, to ñame several possibilities. This 
will avoid confusión in / 

the future when "The Source tapes/files for an 

albura are NOT the Production 
MASTER ." 

\ 


looking through the 
tape library for the one 
and only real 
(production) master." 


Analog tape Preparation 

Begin and end the reel with some "bumper,” 
followed by leader. If possible, put leader between 
songs (except for live concerts and recordings 
edited with room tone). Tape should be slow wound, 
tails out. Label each reel as recommended in 
Appendix 5. Indícate tape speed, record level for o 
VU in nw/M, record EQ (NAB or IEC), track config- 
uration, whether it is mono, stereo or multichannel. 
Indícate if noise reduction is used and inelude the 
noise reduction alignment tone. Inelude alignment 
tones 3 o seconds (or longer) each, mínimum íkHz, 
10kHz, 15kHz, and 100Hz plus (highly 
recommended) 45Hz and 5kHz at oVU without noise 
reduction. Also highly recommended is a tone 
sweep (glide) from 2.0 Hz through 500 Hz. Needless 
to say, the tones must be recorded by the same tape 
rccordcr that rccordcd the music, and ideally, 
recorded through the same consolé and cables that 
were used to make the mix. Many mastering 
engineers prefer having the tones at the tail of a reel 
or on a sepárate reel. 


28¡ Preparing Tapes 

And Files 




Many historie analog tapes do not inelude 
proper tones and soraetimes it is not possible to put 
tones on new masters. If it was not possible to lay 
down tones on the session, then we will use 
sophisticated methods to guarantee azimuth and 
equalization accuracy. 

Give Handles 

For live concerts and many other forms of 
music, it’s useful to inelude handles, that is raw 
footage on either side of the intended music. This 
can inelude out takes, unfaded applause, breaths, 
coughs, noises, speech between tracks, etc. Also 
inelude your production notes and desires, such as 
"please leave that ugly laugh in between songs 2 and 
3.1 think it’s funny." Handles are especially needed 
when a track might have to be noise reduced, for the 
noise sample we need can sometimes only be found 
just before the downbeat. 


What Sample Rate? 

Until circa 2000 ,1 recommended that mix 
engineers tiy to work at 44.1 kHz if possible, 
consideringthe abysmal State of sample rate 
converters. This is no longer the best advice¡ my 


current recommendations are for mixers to work at 
the highest practical sample rate and longest 
available wordlength. However, ifyou are mixing 
digitally, do not sample rate convertyourselves, 

but remain at the 
same sample rate 
as the multitrack. 
Ifyou are mixing 
with an analog 
consolé, there is a 


"CD-ROM Preparation is a nest of 
land mines waiting to explode. ” 


marginal advantage to using a higher sample rate for 
the mixdown recorder than the multitrack. For 
example, even if mixing analog with a multitrack at 
48 kHz, you will get better results with a mixdown 
recorder at 96 kHz. 

Random Access Media: Preparing Files 

CD-ROM preparation requires attentionto 
detail. It’s a nest of land mines to navigate which 
should not be taken lightly, and experience is the 
best teacher. A poorly-prepared CD ROM can waste 
a tremendous amount of time at the mastering 
house. Make sure the mastering hnuse will aceept 
the file tvpes you want to send. I recommend you 
work around Murphy’s law by cutting a test disc and 
sending some files ahead of time that we can check 
out. Here are some critical do’s: 

• Leave blank sound at the head of the file, in 

other words, start the first music at least 1 second 
into the file, not at zero time. (This is to prevent 
glitches that often occur at the file start). 

• Forstereo and multichannel, Interleaved files are 
preferred, AIFF, BWF, or WAV. SDII is also 
acceptable (see Appendix 2). Ask to avoid costly 
conversión time. No MP3’s, please! Start and end 
with high-resolution, linear-format sound files. 

• Tiy to do a single project at one sample rate. It 
involves considerable extra work to deal with 
múltiple sample rates in a project and often involves 
a compromise as we must rate-convert some files to 
get a common rate for the project. But if for some 
reason your project ineludes different rates, 
carefully mark (log) the rate of the files for our 
information. 


Appendix 3 2 86 


• Give each file a meaningful ñame related to the 
songtitle, like Love Me Do, not some meaningless 
serial number, 

• Choose a high-quality name-brand CDR blank. To 
my experience, Taiyo Yuden, the oldest CDR 
manufacturer, continúes to make the most 
compatible and reliable CDRs. 

• For lowest error rate, obtain 74-minute blanks 
from a professional supplier. Avoid the error-prone 
8os, which eliminates going into Costco on a Friday 
night to search for blanks! 

• For lowest error rate, cut at 2X to 4X speed, no 
faster. 

• Write a Fixed disc, i.e. a closed session. To verify 
the disc has been fixed, pop it into a PC or Mac CD 
reader (not a writer) and make sure it can read the 
file ñames. 

• DO NOT USE PAPER LABELS! Stick-on paper 
labels may look impressive, but in my experience 
they appear to increase error, perhaps by altering 
the rotational speed of the disc, and are especially 
problematic at high disc spin speeds. 
multichannels, high sample rates and wordlengths. 
Paper labels can also become partly or completely 
unglued and tear off in the CD reader, which is not a 
pretty sight! Also, do not label the disc with a ball- 
point pen, hut with a soft marker, on the protected 
(overcoated) part of the top surface. 3 While I 
personally believe that the coating on professionally 
over-coated CDRs is sufficient protection from 
scratches and organic solvents (as in an aromatic 
Sharpie-brand marker), the most conservative 


mastering engineers recommendusing water-based 
markers for labeling. Perhaps someone will do a 
long-term study measuring errors on CDRs with a 
coated-marked surface. 

• Write to fíxed-point 24-bit liles (also known as 
Integer Format ). It’s unlikely that the mastering 
house can read any other format; e.g., do not use 
32-bit floatingpoint for files. This situation may 
improve in upcomingyears and we are beginning to 
have success reading Samplitude-format, one of the 
several incompatible 3 ?-bit float file formáis. 

• Use any standard sample rate upto 96 kHz. Verify 
the mastering house can use files with a higher rate 
before cutting. 

• File ñames should not inelude hyphens (-), use an 
underline instead. Do not use the / or \ character. 
For best multi-platform compatibility, stay away 
from spaces and use alphas, numerics and 
underlines only. 4 SADiE v. 4.2 has a strong aversión 
to accent marksand non-Englishcharacters, 
keeping it from generating waveforms, archiving 
and other essentials, something which we hope they 
will change. Macs are far more forgiving in this 
regard than PCs. 

• We love receiving files that inelude the intended 
track number in their ñame. One trick for naming 
files is to inelude the intended track number at the 
beginning (usingtwo digits), which makes it much 
easier to assemble them in the intended order. For 
example: oí I Need Somebody, 09 I Got Rhythm, 

10 She’s So Fine. 

• Avoid periods (dots) in Mac file ñames on Mac 
dises because they might be transferred to PC and 


2S7 Preparing Tapes 
And Files 


Appendix 3 


be confused with extensions; use one and only one 
dot on PC dises in front of the 3 -letter extensión. 

• Verify the mastering house can read DVD-ROMs 
before choosing that médium. 

Split Files 

Interleaved files are less subject to accidents 
since all the channels are guaranteed to start at the 
same point. For multichannel, inelude a note 
indicatingthe channel orderused, e.g., L, R, C, LFE, 
SL, SR or L, R, SL, SR, C, LFE. If you must send split 
files, use a standard nomenclature to distinguish the 
channels, e.g. Do It_L, Do It_R, Do It_SL, Do It_SR, 
Do It_C, Do It_LFE. Letter abbreviations are 
preferable to ambiguous channel numhers. 

When you Get your Master Back 

If the CD master is sent back to you instead of 
directly to the CD plant, don't handle it or play it. 
Play the reí, not the master! 1 2 3 * 5 


1 There is no tracb. order on a non-linear, file-based médium. Often, clientsask 
me. "pul the master in the order it's on the CD ROM.” but they forget the only 
orderon the CD ROM is the alphabetical directory of files. 

2 Andre Subotin on the Mastering Webboard reminds us that there may be several 
truc Mastcrs. caca of which we must clearly label. e.g. Production Master for 
Cassette; Masterforforeign countries; etc. 

3 

Thanks to Cíete Baker and Mike McMillan on the Mastering Webboard for 
clarifications on these points. 

^ Thanks to Cíete Baker on the Mastering webboard for reminding me of this 
essential! 

^Thanks to Rogé: Nichols for the nudge to put these recommendations inthe 
Appendix. 


2 88 



[ Appendix 4 ] 

Logs and Labels for 
Tapes, Dises and Boxes 


Labeling Those Tapes 

I don’t daré put an unlabeled DAT or CDR 
down on nyy mastering desk, for it will 
immediatety be lost in a crowd! Please do put the 
following minimal information on every piece of 
source media, in case it gets separated from the box: 

• Artist 

• Album Title [orworking title] 

• Contact Ñame, phone number 

• Tape or reel number 

• Date [importantto help sepárate out revisions] 

Labeling Those Boxes 

The box label contains much more information 
than what’s written on the reel or disc itself. 


AnalogTape Boxes: An example label 

Some studios have preprinted labels with 
checkboxes for each option. 

Mixtape, Unedited, songs head leadered [or other 
descriptive] 

Artist:_ 

Album Title: _ 

Record Label: _ 

Reel number:_of _ 

Catalog Number:_ 

Studio, Address, Contact Phone *: _ 

Engineer: _ 

Assistant:_ 

Producen _ 

Date:_ 

Format, £Q, Speed, Level: [e.g. \/2” 2-track AES 
stereo, no noise reduction, 30IPS, 0 VU = 320 n W/M 
orOVU = 250 nW/M + 2 dB] 

Test Tones @ Head_@ Tail_consistirg 

of_ Hz at 0 VU 

Ñame of Song or track 
Length 

Comments [e.g. "vocal up” or "needs fadeout” or 
"leave countoffat the beginning" 

Ñame of next song, etc. 


2 S9 

















Further comments can be written in a letter that 
accompanies all the media. 

Dises: Example Label 

There is not enough room on a CDR or DVD-R 
surface to write eveiythingwe want to know. Some 
studios have prescreened dises with checkboxes. At 
minimum, the top surface of the disc itself should 
inelude: 

Mixes, Unedited [or submaster or other descriptive] 

Artist:_ 

Album title: _ 

Record Label: _ 

Disc and File Format: [e.g. ISO-9660 or HFS, or 
Masterlink, Stereo AIFF Files, 48 kHz/24 Bit] 

Disc*_of_ Date: _ 

[date is very critical] 

Plus, if possible: 

Contact ñame and Phone *:_ 

Catalog Number:_ 

Since there is not enough room to list all 
information on the disc itself, be sure to inelude the 
remaining information onthe box, jewel box, 
and/or printed log (pictured opposite page) which 
accompanies the media. If possihle, the log can be 
duplicated in a READ_ME.doc file which resides on 
the disc, so it will never be lost. 

Dises, jewel Box or Paper cover label 

Instead of usingup several jewel boxes, some 
studios cleverly put CDRs inside a taped and folded 
printout of the disc’s directoiy, which covers all the 
ñames of the tunes inside the disc. When shipping, 
put these paper-covered discs in a foam-lined 


Appcndix 4 ¡?po 


hard-box to prevent scratching or breakage. As 
described in Appendix 3, the title ñames can also 
inelude their eventual sequence order, if this is 
known at the time of disc creation. 

Printed Log/letter 

Accompany the dises or tapes with a printed 
log/letter to the mastering engineer. This is where 
you can also inelude all your comments and 
thoughts on the eventual mastering. You can put this 
in the form of a letter, which ineludes your stoiy and 
feelings about the álbum and its sound. Some 
comments (especially the need for a fade!) may be 
superfluous but put down anythingyou are 
concemed about. 

Don't forget to inelude: 

Artist:_ 

Album title: _ 

Record Label: _ 

Disc, File Format, Sample Rate, Wordlength: [e.g. 
ISO-9660 or HFS, or Masterlink, Stereo AIFF Files, 48 
kHz/24 Bit] 

Contact ñame and Phone*: _ 

Contact Address:_ 

Catalog Number:_ 


















Title/File Ñame CO track Order 


Length 

(approx.) 


ABS time/DAT or 
CD Program ID 
(not relevant if 
this ¡s a disc of 
files) 


Comments 
[e.g. by engineer, 
producer or artist] 


I Wanna Make 
/ou Happy/ 

05_makehappy.wav 


6S6080132805 



Love Me Do/ 
D2Jovemedo.wav 


Why Me?/ 
D4_yme.wav 


5:02 



£56080132804 


This song needs a 
fadeout. Try start- 
ing circo 3:45 and 
be out by 4:00 from 
the downbeat so as 
not to hear the 
snickering! Please 
inelude the sticks 
at the beginning. 


This is an obvious 
tribute to the 
Beatles. The more 
Beatle-like you can 
make the master- 
ing, the happier I 
will be! 

This is the only bai¬ 
lad on the álbum. 
The artist is not 
happy with her 
intonation entering 
the last chorus. Is 
there anythingyou 
can do about this? 


291 Logs and Labels 









































[ Appendix 5 ] 

Decibels 


Marking Analog Tapes 

I once received a 1/4” tape in the mail marked 
"the level is +4 dBm.’' But dBm and dBu do not 
travel l’romhouse to house. dBu is a measurement 
of a voltage expressed in decibels and there is no 
voltage on an analog tape, only magnetic flux in 
nanowebers per meter. The 1/4” tape doesn’t have 
any idea whether it was made with a semi-pro level 


oí' o VU = -10 dBu or a professional level of +4. 
Instead, just indícate the magnetic flux level which 
was used to coordínate with o VU. For example. 
label it o VU=¿foo nW/M at 1 kHz. 400 nW/M is 6 dB 
over 300 nW/M, and engineers often abbreviate 
this on the tape box as +6dB/200, asyou can see 
from this convenient chart. 


Chart 1: 

Tape Fluxivity in dB and nanowebers per meter (nW/M) 

Level dB Reference 185 Reference 200 Reference 250 

Reference 320 

Reference 400 

9 

521 

564 

705 

902 

1127 

8 

465 

502 

628 

804 

1005 

7 

414 

448 

560 

716 

895 

6 

370 

400 

500 

640 

800 

5 

329 

356 

445 

569 

711 

4 

293 

317 

396 

507 

634 

3 

261 

283 

353 

452 

565 

2 

233 

252 

315 

403 

504 

1 

208 

224 

281 

359 

449 

0 

185 

200 

250 

320 

400 


Find the actual nanowebers per meter of flux for a given reference flux. For example, a tape which ¡s 4 dB hotter than 250 iW ¡s 
396, or rounded up to about 400. This Is the same fluxivity as a tape which is 6 dB hotter than 200. 


Appendix 5 292 





Chart “2: 

dbu (reference 0.775 volts) converted to voltage 

dBu 

Volts 

40 

77.500 

35 

43.581 

24 

12.283 

20 

7.750 

18 

6.156 

16 

4.890 

14 

3.884 

12 

3.085 

8 

1.947 

6 

1.546 

4 

1.228 

3 

1.095 

2 

0.976 

1 

0.870 

0 

0.775 

-10 

0.245 

-20 

0.078 

-60 

0.001 


[ Appendix 6 ] 


Q to Bandwidth Conversions 


B/W 

Q 

1 e 

B/W H 


0.02 

72.13 

0.50 

2.54 


0.03 

48.09 

0.55 

2.35 


0.04 

36.07 

0.60 

2.19 


0.05 

28.85 

0.65 

2.04 


0.06 

24.04 

0.70 

2.00 


0.07 

20.61 

0.75 

1.80 


0.08 

18.03 

0.80 

1.70 


0.09 

16.03 

0.85 

1.61 


0.10 

14.42 

0.90 

1.53 


0.20 

7.21 

0.95 

1.46 


0.30 

4.80 

1.00 

1.39 


0.40 

3.60 

1.10 

1.27 


0.50 

2.87 

1.20 

1.17 


0.60 

2.39 

1.30 

1.08 


0.70 

2.04 

1.40 

1.01 


0.80 

1.78 

1.50 

0.94 


0.90 

1.58 

1.60 

0.89 


1.00 

1.41 

1.70 

0.84 


1.20 

1.17 

1.80 

0.79 

Usethischartforan 

1.40 

0.99 

1.90 

0.75 

equalizer whose Controls 

1.60 

0.86 

2.00 

0.71 

are marked ¡n bandwidth 

1.80 

0.75 

3.00 

0.48 

but when you wish to think 

1.90 

0.71 

4.00 

0.36 

in Q, or vice versa. 

2.00 

0.67 

5.00 

0.29 

Bandwidth is expressed in 

2.20 

0.60 

6.00 

0.24 

octaves, at the 3 dB down 

2.40 

0.54 

8.00 

0.18 

point. The formula to 

2.60 

0.49 

10.00 

0.14 

convert bandwidth to Q is 

2.80 

0.44 

20.00 

0.07 

Q = Square Root(2**BW) / 

3.00 

0.40 

30.00 

0.05 

(2**[BW-1]). 


293 Appendix 5/6 













[ Appendix 7 ] 

I Feel The Need For Speed 


Médium 

Speed 

MB/hour 

Speed MB/min Speed MB/sec 

Speed Mb/sec 

Hours to run 

one CD 

Minutes to run 

one CD 

Seconds to 
Run or copy 
one CD 

Facts 

CD Player 

655.04 

10.584 

0.1764 

1.4112 

1.00 

60.0 

3600.0 

CD total bytes in one hour — 635,040,000 
About the same as a TI link 

DSL 384 kbps 

173 

2.88 

0.048 

0.384 

3.68 

220.5 

13230.0 

Assuming Internet running at 
máximum e ; ficiency 

10 BaseT 

4,500 

75 

1.25 

10 

0.14 

8.5 

508.0 

CD speed. Bytes per minute — 10,584,000 

100 BaseT 

45,000 

750 

12.5 

100 

0.01 

0.8 

50.8 

CD total MB-635.04 

1000 BaseT 
(Gigabit Ethernet) 

450,000 

7500 

125 

1000 

0.0014 

0.1 

5.1 

This is theoretical point to point with no 
collisions. Ethernet mileage will be ¡nuch 
slower on a busy network. Use an Ethernet 
Switch instead of a Router to maximize 
speed and minimize collisions. 

USB 1.0 slow 

675 

11.25 

0.1875 

1.5 

0.94 

56.4 

3366.9 


USB 1.0 fast 

5,400 

90 

1.5 

12 

0.12 

7.1 

423.4 


USB 2.0 

216,000 

3600 

60 

480 

0.0029 

0.2 

10.6 


Firewire 

180,000 

3000 

50 

400 

0.0035 

0.2 

12.7 

This is the máximum speed of the interface. 
Individual drives are much slower 

Seagate 18 GB 
Ultra SCSI 160 

108,000 

1800 

30 

240 

0.01 

0.4 

21.2 

Typical internal transfer rate of a modern 
LVD drive 

Ultra ATA/66 
10,000 RPM 

147,600 

2460 

41 

323 

0.0043 

0.3 

15.5 


Ultra 2 SCSI 

576,000 

9600 

160 

1230 

0.0011 

0.1 

4.0 

This is the interface, individual drives much 


160 MB/s LVD slower. RAID can reach this speed. 

Abbieviations: MB = Megabytes, Mb=Megabits (8 bits/byte) All times normalized to capacity of one hour long stereo audio CD. 

Mega is defined as 1 million. Kilo is 1 thousand. Some of these 
figures would change slightly if kilo is defined as 1024. 


m 















[ Appendix 8 ] 

I Feel The Need For Capacity 


Prior to 1990 ,1 was makingCD masters with 
linear editing using the Sony 3/4" editing systems. 
^1990 I set up my first nonlinear mastering 
workstation, purchasing the highest capacity hard 
dises available, a pair of 600 MB SCSI hard dises. 


that cost $1500 retad, or $1.25 per MB. Fortunately, 
as our needs have gone up in 10 years, capacity has 
increased geometrically and cost has gone down. 
Thus it’s not out of line to expect typical storage 
capacity to tentuple in 10 years. 


Year 

Type of 
Storage 

Capacity MB 

Capacity GB 

Total Cost 
US Dollars 

Cost per MB 

Cost per GB 

Number 
of lHr 
Compact 
Dises 

Number of 

1 Hr ,6-Ch., 
24 - bit 

Surround 
Masters at 
44.1 kHi 

Number of 

1 Hr, i-Ch., 
24-bit 

Surround 

Masters at 
96 kHs 

Number of 

1 Hr, 48-Ch., 
24-bit 
Multitracks 
ot 96 kHz 

Facts 

1980 

Data General 

29? 

0.297 

$35,000 

$118 

$118,000 

0.5 

N/A 

N/A 

N/A 

Size: 2 feet x 3 feet 












x 3-1/2 feethigh! 

1990 

SCSI Hard 

600 

0.60 

$750 

$1.25 

$1,250 

0.94 

0.2 

0.1 

0.012 

CD one hour 


Disc 










635,040,000 bytes 

2002 

IDE Hard Disc 

30,000 

80 

$137 

$0.0017 

$1.71 

125.98 

28.0 

12.9 

1.608 

Street price 

2010 

Raid? 

800,000 

800 

$16 

$0.0000 

$0.02 

1259.76 

279.9 

128.6 

16.075 

Projected cost, as per 


Optical? archivebuildsrs.com 


Abbreviations: MB = Megabytes, GB= Gigabytes (1000 MB) 


*95 








[ Appendix 9 ] 

Footnotes on The K-System 


The VU Meter’s Actual Ballistics 

The VU meter's actual ballistics were analyzed 
as early as 1940. According to A New Standard 
Volume Indieator and Reference Level, 
Proceedings of the I.R.E., Januaiy, 1940, the 
mcchanical VU meter used a 

copper-oxide full-wave rectifier 
which, combined with electrical 
damping, had a defined averaging 
response according to the formula ¡ =k 
* e to the p equivalent to the actual 
performance of the ¡nstrumentfor 
normal deflections. (In the equation i 
is the instantaneous current in the 
instrument coil and e is the instan- 
taneous potential applied to the 
volume indieator)....a number of the 
new volume indicatorswere found to 
have exponents of about 1.2. 

Therefore, their characteristics are 
intermedíate between linear (p = l) 
and square-law or root-mean-square 
(p=2) characteristic. 


History and Development of the SMPTE Standard, 
from Errors to Knowledge 

The theatre standard, Proposed SMPTE 
Recommended Practice: Relative and Absolute 
Sound Pressure Levels for Motion-Picture 
Multichanncl Sound Systems, SMPTE Documcnt 
RP 200, defines the calibration method in detail. In 
the 1970’s the valué had been quoted as 8$at o VU 
but as the measurement methods became more 
sophisticated, this valué proved to be in error. It has 
now been proved to be S5 at -18 dBFS RMS with o VU 
remainingat -20 dBFS (sine wave). The histoiy of 
this metamorphosis is interesting. AVU meter was 
originally used to do the calibration, and with the 
advent of digital audio, the VU meter was calibrated 
with a sine wave to -20 dBFS. However, it was 
forgotten that a VU meter does not average by the 
RMS method, which resulted in an error between 
the RMS electrical valué of the pink noise and the 
sine wave level. While 1 dB is the theoretical 
difference, in practice I’ve seen as much as a 2 dB 
discrepancy between certain VU meters and the true 
RMS pink noise level. 

The other problem is the measurement 
bandwidth: a wide bandwidthvoltmeterwill show 
attenuation of the source pink noise signal on a long 
distance analog cable due to capacitive losses. The 


2 96 


solution is to define a specific measurement 
bandwidth (30 kHz). By the time all these errors 
were tracked down, it was discovered that the 
historical calibration was in error by 2 dB. Using 
pink noise at an RMS level of -30 dBFS RMS must 
correctly result in an SPL level of only 83 dB. In 
order to retain the magic 85 number, the SMPTE 
decided to raise the specified level of the calibrating 
pink noise to -18 dBFS RMS. but the result is the 
identical monitor gain. One channel is measured at 
a rime, the SPL meter set to C weighting, slow, and 
as explained in Chapter 14, a more accurate 
measurement can be obtained via i /3 octave 
analysis. The K-System is consistent with RP 300 
only at K-30. I feel it will be simplerin the longrun 
to calíbrate to 83 dB SPL at the K-System meter’s o 
dB rather than confuse future users with a non- 
standard +3 dB calibration point. 

It is critical that the thousands of studios with 
legacy Systems that incorpórate VU meters should 
acjust the electrícal relationship of the VU meter 
and digital level via a sine wave test tone, then 
ignore the VU meter and align the SPL with an RMS- 
calibrated digital pink noise source. 

Detailed Specifications of the K-System Meters 

General: All meters have three switchable 
scales: K-30 with 30 dB headroom above o dB, K-14 
with 14 dB, and K-13 with 13 dB. The K/RMS meter 
versión (fíat response) is the only required meter— 
to allow RMS noise measurements, system 
calibration, and program measurement with an 
averaging meter that ciosely resembles a slow VU 
meter. The other K-Systemversionsmeasure 


* Thanks toTomlinson Holman (in correspondence) forexplatning the historical 
source of the measurement errors. and how 85 became 83 aftcr a long battle. 


loudness by various known psychoacoustic methods 
(e.g., LEQ and Zwicker). 

Scales and freqnency response: Atri-color 
scale has green helow o dB, amber to +4 dB, and red 
above that to the top of scale. The peak section of the 
meters always has a fíat frequency response, while 
the averaging section varies depending on which 
versión is loaded. For example: Regardless of the 
sampling rate, meter versión K-30/RMS is band- 
limited as per SMPTE RP 300, with a flat frequency 
response from 30-30 kHz +/- 0.1 dB, the averaging 
section uses an RMS detector, and o dB is 30 dB 
below full scale. To maintain pink noise calibration 
compatibility with SMPTE proposal RP 300, the 
meter’s bandpass will be 33 kHz máximum 
regardless of sample rate. 

Other loudness-determiningmethods are 
optional. The suggested average section of Meter K- 
30/LEQAhas a non-flat (A-weighted) frequency 
response, and response time with an equal- 
weighted time average of 3 seconds. Since loudness 
is generally an overall sensation, a case can be made 
for a monophonic loudness meter. Expert psychoa- 
cousticians designing a trae loudness K-System 
meter must resolve that discrepancy, permit 
production engineers to retain the desirable 
individual channel meters. They will calcúlate the 
proportion of the total loudness in each channel. 
The average section of Meter K-30/Zwicker 
corresponds with Zwicker’s recommendations for 
loudness measurement. Regardless of the frequency 
response or methodoiogy of the loudness method, 
reference o dB of all meters is cslibrated such that 
30-30 kHz pink noise at o dB reads 83 dB SPL on 


297 



each channel, C weighted, slow. Psychoacousticians 
designing loudness algorithms recognize that the 
two measurements, SPL and loudness are not 
interchangeable and take the appropriate steps to 
calíbrate the K-systern loudness meter o dB so that 
it equates with a standard SPL meter at that one 
critical point with the standard pink noise signal. 

Scale gradations: The scale is linear-decibel 
from the top of scale to at least -24 dB, with marks at 
1 dB increments except the top 2 decibels have 
additional marks at 1/2 dB intervals. Below -24 dB, 
the scale is non-linear to accommodate recpiired 
marks at - 3 o, -40, -50, -60. Optional additional 
marks through and beyond -70. Both the peak and 
averaging sections are calibrated with sine wave to 
ride on the same numeric scale. Optional 
(recommended): A /oXexpanded scale mode, 0.1 dB 
per step, for calibration with test tone. 

Peak section of the meter: The peak section 
represents the true, fíat (1 sample) peaklevel, 
regardless of which averaging meter is used. An 
additional pointer above the movingpeak 
represents the highest peak in the previous 1 o 
seconds. Designers can add an oversamplingpeak 
movement as long as it is clearly marked and 
identified, especially since all our emphasis on 
loudness judgment is based on the averaging section 
and its scale. A peak hold/release button on the 
meter changes this pointer to an infinite high peak 
hold until released. The meter has a fast rise time 
(aka integration time) of one digital sample, and a 
slow fall time, -3 seconds to fall 26 dB. An 
adjustable and resettable OVER counter is highly 


Appendix 9 29S 


recommended. countingthe number of contiguous 
samples that reach full scale. 

Averaging section: An additional pointer above 
the moving average level represents the highest 
average level in the last ten seconds. An average 
hold/release button on the meter changes this 
pointer to an infinite highest average hold until 
released. The RMS calculation should average at 
least 1024 samples to avoid an oscillating RMS 
readout with low frequency sine waves, but keep a 
reasonable latency time. Ifit is desired to measure 
extreme low frequency tones with this meter, the 
RMS calculation can optionally be increased to 
inelude more samples, but at the expense of latency. 

Ballisties: This is only relevant to the RMS 
meter, as the "ballisties’' of the true loudness 
versions will be determined by the algorithm. After 
RMS calculation, the meter ballisties are calculated, 
with a specified integration time of 600 ms to reach 
99% of final reading (this is half as fast as a VU 
meter). The fall time is identical to the integration 
time. Rise and fall times should be exponential 


[ Appendix 10 ] 

Recommended Reading, 

CDs for Equipment Testing and EarTraining 


Books 

Burroughs, Lou (1974) Microphones-, Design and 
Application, Sagamore Publishing Company, 
Inc. Out of Print. A classic audio work, the first 
book to publish the 3 to 1 rule with frequency 
measurements of the anomalies. 

Holman, Tomlinson (2000) 5.1 Sunound Sound: Up 
and Running, Focal Press. Also ineludes guides 
on the problems of locating speakers near 
consoles. 

Howard, David M. & Angus, James (2001) Acoustics 
and Psychoacoustics, Focal Press. Ineludes good 
discussion of the time/frequency relationship 
of filtering. 

Kefauver. Alan P. (1999) Fundamentáis of Digital 
Audio, A-R Editions, Madison, WI 

Kirk, Ross & Hunt, Andy (1999) Digital Sound 

processingfor Music and Multimedia, Focal Press, 
Boston. 

Nisbett, Alee (2003) The Sound Studio: audio 

techniques for radio, televisión, füm and recording, 
yth Edition. Focal Press. A classic work with 
practical techniques which will never go out of 
style. I started with the 1962 edition! 

Owsinski, Bobby (2000) Mastering Engineer's 

Handbook, ISBN # 0-87288-741-3. Acollection 
of interviews with mastering engineers. 


Pohlman, Ken (2000) Principies of Digital Audio, 
McGraw Hill. 

Watkinson, John (1988, regularly revised) TheArt of 
Digital Audio, Focal Press, ISBN o 240 51320 7. 
The definitive industry bible. This is where you 
must first go for in-depth information on how 
digital audio works and the specifications of 
much of today’s digital audio equipment and 
interfaces. 

Magazines 

One To One Magazine, United Business Media 
International Ltd, Leics, United Kingdom, 

http://www.cmpmformation.com. 

Articles in Print 

Blesser, A. & Locanthi, B., (1986) The Application of 
Narrow-Band Dither Operatingat the Nyquist 
Frequency in Digital Systems to Provide Improved 
SignaTto-Nose Ratio over Conventional Dithering, 
AES 8ist Convention, Preprint 2416. 

Cabot, Richard C. (1989) MeasuringAES-EBU Digital 
Audio Interfaces, AES 87th Convention Preprint 
2819(1-8). 

Gerzon, M.A., Craven, P.G., Stuart, J.R., & Wilson. 
R.J. (1993) Psychoacoustic Noise-Shaped 
Improvements to CD and other Linear Digital 
Media, AES 94th Convention, Preprint #3501. 

Lipshitz, S.P. & Vanderkooy, J. (1989) Digital Dither: 
Signal Processing with Resolution Far Below the 




Least Significant Bit. AES 7th International 
Conferenoe-Audio In Digital Times, Toronto. 
87-96. 

Muncy, Neil, Wliitlock, Bill et al (1995) Collection of 
definitive articles on grounding. shielding. 
power supply, EMI, RFI. Journal oftheAESVol. 
43 Number 6 , a special excerpt printing. 

Nielsen, Soren & Lund, Thomas (2000) o dBFS+ 
Levels in Digital Mastering. AES íogth 
Convention, Preprint *5251. 

Stuart, J.R. &Wilson, R.J. (1991) ASearch for 
Effieient Dither for DSP Applications, AES 
92nd Convention, Preprint # 3334_ 

Stuart, J.R. (1993) Noise: MethodsforEstimating 
Detectability and Threshold , 94th AES 
Convention Preprint *3477. 

Compact Dises 

Auralia, Complete Ear Training software for musicians, 
Rising Software, Australia, http://www.rising- 
sofWare.com 

Grimm, Eelco, (2001) CheckpointAudio Professional 
Audio Test Reference, Contekst Publishers, 
Netherlands, ISBN 90-806111-1-5. Test and 
listeningCD including]-Test, BongerTest, and 
unique distortion and listening tests. Written in 
Dutch with no English translation (as of 2002). 

Moulton, David, David Moulton's Audio LectureSeries. 
Golden Ears audio ear-training self-study course, 
KIQ Productions (Golden Ears), or 
http ¡ // www.moultonlabs. com 

Edrious compilation and test CDs, Chesky Records, 

http://www.chesky.com. 


Appendix 10 3 oo 


Articles On the Internet 

Dunn, Julián, AES 3 and IEC60958, item #26 written 
for Audio Precisión, http://www.audiopre- 
ci8ion.com/publication8/technotes/index.htm 

Dunn, Julián, various articles onjitter and nthe.r audio 
topics, at Nanophon, 

http://www.nanophon.com. 

Laviy, Dan, various articles on sampling, oversampling, 
jitter. etc., http://www.lavryengineering.com/ 
in the Product Support area. 

Stoiy, Mike, various articles on high sampling rates, 
jitter, etc. DCS, Inc. 

http:/ / www. dcsltd. co.uk/ papers/ 

SMPTE RP200 proposed standard. SMPTE 
http://www.smpte.org/8td8/. 

TC Electronic, articles onjitter, 5.1 surround, o 
dBFS+ levels, etc. 

littp://www.systemó 000. com/systemóo o o. asp 
?Section=i9 


[ Appendix 11 ] 

Eric James Biography 



Eric James is an Englishman, inhis mid- 
forties, inordinately fond of chamber music and 
acoustic jazz, who has been a university teacher (of 
the history and philosophy of Science and medicine) 
in Hong Kongfor twelveyears. He has decided, after 
giving the matter much thought, thai these four facts 
probably have nothing very much to do with the 
tremendous satisfaction he derived from working, 
as editor, with Bob on this book. On the other hand, 
although Eric has been an academie for fifteen 
years, before he started his gradúate studies (late, at 
Oxford, in the histoiy and philosophy of 
inathematics) he spent a large part of his working 
life as a professional musician, and he has very 
recently resigned his academie tenure in order to 
return to the UK to develop the music recording and 
editing company - URM Audio — which he had been 
running on a part-time basis since 1998. 


In 2001 Eric became a father, and he would like 
to thank his daughter — Jamie Martha Perry — and 
her mother, Sally. for putting just about everything 
else into its proper perspective. 


3oi 


[ Appendix 12 ] 

Robert A. Katz Biography 



From his earliestyears, Bob has been as curious 
as a Katz. He voraciously reads audio books, Service 
manuals, product spec sheets, license plates, and 
bumper stickers. But his favorite reads are Science 
Fiction writers Spider Robinson and Frederick 
Pohl, which may explain Bob’s punny personality. 

In his teens he dahbled in hypnotism and magic, but 
was a bit klutzy to turn that into a career. Bob is an 
animal lover—all dogs and cats love him back. 


Corning from a family of medical doctors, 
musicians and composers, Bob gravitated to the B 
fíat clarinet at the age of ten; his aunt, a viola 
teacher, gave Bob his first lesson in solfége and 
transposition. At the age of i 3 , he rebuilt his first 
tape recorder. After wiring the house for sound, he 
was forced by his parents to remove the 
mierophones he had seoreted throughout the house. 
Clearly destined for a career in audio, by high school 
he had begun an amateur recording career, plus 
studying the Sciences and linguistics, practicing 
French and Spanish and looking for female pen país 
on three continents. Perhaps out of default he was 
voted most versatile in his class. Eventually his 
language skills would reach the point where he can 
give seminars in any of three languages. 

An enthusiastic young man with a passion for 
good sound, Bob developed a reputation as an 


3oa 


audiophile around Hartford, Connecticut town. The 
local audio stores regularly invited him over, for Bob 
is never short of opinions. One day he was invited to 
audition a new pair of speakers with the designer 
present. After hearing a few notes, Bob ran out of 
the store covering his ears! Over the years, he has 
learned to be more diplomatic, but his opinions 
continué to be defined by a love for the art of audio. 

In college he played in an ad hoc Dixieland 
ensemble, and the treat of his performance life was 
soloingStreet Georgia Brown before the homecoming 
football crowd. Two years at Wesleyan University 
were followed by two more at the University of 
Hartford, studying Communication andTheatre, but 
he spent less time in the classroom and more at the 
college radio station, where he became recording 
director. A fan of the Firesign Theatre, Bob used to 
writc and cdit humorous radio ads, and he became a 
DJ. manning a free-form-progressive rock radio 
show titled The Katz Meow , and doing a stint on the 
commercial rock station. 

Bob taught himself analog and digital 
electronics, and was influcnccd by a numbcr of 
Creative designers. In Hartford, Bob's mentor was 
Steve Washburn, an EE who invented a way to nearly 
double the power-handling of a Hartley 24” woofer 
and also constructed Bob’s first custom-built 
portable audio consolé. Just out of college, Bob 
became (1972) Audio Supervisor of the Connecticut 
Public Televisión NetWork , producing eveiy type of 
program from game shows to documentaries. music 
and sports, and he learned to mix all kinds of music 
live. When he wasu't workiug televisión, he was on 
location recording music groups direct to 2-track. 


In 1972, Bob wrote his first article for dB 
magazine, describinga set nf mike heaters be 
developed to warm his AKG microphones and keep 
them from sputtering due to changes in humidity. 
This spiked a heated controversy as Stephen 
Temmer of Gotham Audio wrote a response stating 
that "Neumann microphones are never affected by 
humidity” but Bob’s experience was supported by 
some others and in those pre-Internet times the 
controversy remained of modest proportions. 
Hooked by the writing bug, Bob is a natural-born 
teacher who puts himself in the mind of the leamer. 
He has written over a hundred articles and reviews 
in publications suchas dB, RE/P, Mix, Audio Media, 
JAES, PAR, and Stereophile. 

In 1977 he moved from Connecticut to New 
York City, and began a recording career in records, 
radio, TV, and film as well as building and designing 
recording studios and custom recording equipment. 
Long before the advent of the home PC, Bob taught 
himself several Computer languages, and sold one 
assembly-language program used in an embedded 
system at a brokerage firm. Duringthe primitive 
time before cell phones, the voice oí Matilda became 
well known. Matilda answered Bob's phone and 
forwarded calis to any place Bob happened to be. 
Visitors to Bob’s house were dismayed to discover 
that sultiy-voiced Matilda was not flesh and blood 
bul ralher a 6502-based controller, DTMF 
encoders, decoders and other gear. Matilda’s true 
identity remains a mystery today. 

From 1978-79, he taught at the Institute of 
Audio Research, supervised the rebuild of their 
audio consolé and studios and began a friendship 


3 o 3 


with LAR’s founder Al Grundy, mentor and 
influence. Other New York era influences inelude 
Ray Rayburn and acousticians Francis Daniel and 
Doug Jones. Inthe 8o’s, one of his clients was the 
spoken-word label. Caedmon Records, where he 
recorded actors including Lillian Gish, Ben 
Kingsley, Lynn Redgrave and Christopher Plummer. 

An active member of the New; York Audio Society, 
Bob was the ultimate audiophile. This led to a lull - 
page inters'iew/article in the Villagc Voice calledSea; 
With The Proper Stereo, a story about Bob’s railroad 
apartment on East 90^ with the empty refrigerator 
in the kitchen and mysterious monoliths in the 
living room. 

Rut the refrigerator was not empty for long. In 
1984, Bob w'as doing sound for a motion picture in 
Venezuela and met multi-lingual Mary Kent, 
production assistant. After the filming, Bob invited 
Maiy to come to New York for a vacation that became 
a permanent engagement! One day new girlfriend 
Mary carne home and turned on the stereo system in 
the wrong order, blowing up the Krell amplifier and 
one of the Symdex woofers producing sparks and 
biue smoke. When Bob arrived home, he calmed her 
down— "Don’t worry, Maiy, your love for me means 
more thanany stereo system." Bob and Maiy have 
been together ever since (Maiy jokes that she’s 
really in love with the stereo system). 

One day Bob received a cali from musician 
David Chesky, who had read the Voice article and was 
looking for an audiophile recording engineer. In 
1988 this led to a long and pleasant association with 
Chesky Records, which became the premiere 


audiophile record label. Bob specializes in 
minimalist mikingtechniques (no overdubs) for 
capturing jazz and other music that commonly is 
multimiked. His recordings are musically balanced, 
exciting and intímate while retaining dynamics, 
depth and space. In 1989 he built the first working 
model of the DBX/UltraAnalog 128X oversampling 
A/D converter, and produced the world’s first 
oversampled commercial recordings. Over the 
years, the converter was refined, until by 1996 Bob 
found a commercial model that perforined slightly 
better. Bob has recorded about 150 records for 
Chesky, including his second Grammy-winner, and 
in 1997 the world’s first commercial 96 kHz/24 bit 
audio DVD (on DVD-Video). 

This obsession with good sound has developed 
into Bob's passion: Mastering with a Capital M. Eveiy 
day, he applies his specialized techniques to bring 
the exciting sound qualities of live music to eveiy 
form recorded today. In 1990 he founded Digital 
Domain, which masters music from pop, rock, and 
rap to audiophile classical. Besides mastering. 
Digital Domain provides complete Services to 
independent labels and clients, graphic design and 
replication. Maiy, who became Bob’s wife, isan 
accomplished photographer and graphic artist, the 
visual hall' of the Digital Domain team and more 
than two-thirds of the charm. In 1996, Bob and 
Mary moved the company from New York to 
Orlando, adding numerous Florida-based artists 
and labels to the international clientele. 

Inthe 90’a, Bob invented three commercial 
produets, found in mastering rooms around the 


Appendix 12 3 o 4 


world. The first product, the FCN-1 Format Converter , 
was dubbed by Roger Nichols the Swiss-Armyknife of 
digital audio. Then carne the VSPmodelPand S 
Digital Audio Control Centers, which received a 
Class A rating in Stereophile Magazine. These devices 
perform jitter reduction, routing, and sample rate 
conversión. 

Bob has delivered lectures and seminars to the 
Audio EngineeringSociety atthe conventions and 
sections and chaired AES workshops. He has been 
Convention Workshops Chairman, Facilities 
Chairman and served as Chairman of the AES New 
York Section. In 1991. Bob began the Digido 
website. the second audio URL to make the World 
Wide Web, an educationally-oriented site which has 
grown to be a premium source for audio 
information. Over 1000 pages around the globe have 
linked to www.digido.com. 

Bob’s first 2i st centuiy invention is patent 
pcnding. He dcsigncd and introduccd an cntirc ncw 
categoiy of audio processor, the Ambience 
Recoveiy Processor, which uses psychoacoustics to 
extract and enhance the existing depth, space, and 
definition of recordings. Z-Systems of Florida and 
Weiss Audio of Switzerland have licensed Bob’s K- 
Stereo™ and K-Surround™ processes. 

Bob has mastered CDs for labels including EMI, 
BMG, Virgin, Warner (WEA), Sony Music, Walt 
Disney, Boa, Arbors, Apple Jazz, Laser’s Edge, and 
Sage Arts. He enjoys the Celtic music of Scotland, 
Ireland, Spain and North America, Lalin and other 
world-music. Jazz, Folk, Bluegrass, Progressive 
Rock/Fusion, Classical, Alternative-Rock, and many 


other forms. Clients inelude a performance artist 
and poet from Iceland; several Celtic and rock 
groups from Spain; the popular music of India; top 
rock groups from México and New Zealand; 
Progressive rock and fusión artists from North 
America, France, Switzerland, Sweden and Portugal; 
Latin-Jazz, Merengue and Salsa from the U.S., Cuba, 
and Puerto Rico; Samba/pop from Brazil; tango and 
pop music from Argentina and Colombia, 
classical/pop from China, and a Moroccan group 
called Mo ' Rockin'. 

Bob mastered Olga Viva, Viva Olga, by the 
charismatic Olga Tañon, which received the 
Grammy for Best Merengue Album. 2000. Portraits 
ofCuba, by virtuoso Paquito D'Rivera. received the 
Grammy for Best Latin Jazz Performance, 1996. The 
Words ofGandhi, by Ben Kingsley. with music by Ravi 
Shankar, received the Grammy for Best spoken 
word, 1984. In 2001 and 2002, the Parents' Choice 
Foundation bestowed its highest honor twice on 
albums Bob mastered, givingthe GoldAwardto 
children’s CDs, Anís In MyPants, and Oíd Mr. Mackle 
Hackle, by inventive artist Gunnar Madsen. The Fox 
Family’s álbum reached # i on the Bluegrass charts. 
African drummer Babatunde Olatunji’s Love Drurn 
Talk, 1997, was Grammy-nominated. 

Bob’s recordings have received disc of the month 
in Stereophile and other magazines numerous times. 
Reviews inelude: "best audiophile álbum ever 
made” (McCoy Tyner: New York Reunión reviewed in 
Stereophile). "Ifyoucare about recorded sound as I 
do, you care about the engineers who get sound 
recorded right. Especially you appreciate a man like 


305 Katz Biography 


Bob Katz who captures jazz as it should be caught.” 
(Bucky Pizzarelli, MyBlue Heaven reviewed in the 
San Diego Voice & Viewpoint). "Disc of the month. 
Performance 10, Sound 10” (David Chesky: New York 
Chorinhos , in CD Review). "The best modern- 
instrument orchestral recording I have heard. and I 
don’t know of many that really come cióse.” (Bob’s 
remastering of Dvorak: Symphony 9, reviewed in 
Stereophile). 

Some of the grcat artists Bob is privileged to 
have recorded and/or mastered inelude: Aíro- 
Cuban AllStars, MontyAlexander, Cari Alien, Jay 
Anderson, LennyAndrade, Michael Andrew, 
Lucecita Benitez, Berkshire String Quartet, Gordon 
Bok, Luis Bonfa, Boys of the Lough, Bill Bruford, 
Ron Cárter, Cyrus Chestnut, George Coleman. Larry 
Coryell, Eddie Daniels, Los Dan Den, Dave Dobbyn, 
Paquito D'Rivera, Arturo Delmoni, Garry Dial, Dr. 
John, Toulouse Engelhardt, Robín Eubanks, George 
Faber, John Faddis, David Finck, Tommy Flanagan, 
Foghat, Fox Family. Johnny Frigo, Ian Gillan. Dizzy 
Gillespie, Whoopi Goldberg, Bill Goodwin, Arlo 
Guthrie, Steve Hackett, Lionel Hampton, Emmy Lou 
Harris, Tom Harrell, Hartford Symphony, Jimmy 
Heath, Vincent Herring, Conrad Herwig, Jon Hicks, 
Billy Higgins, Milt Hinton, Fred Hirsch, Freddie 
Hubbard, David Hykes Harmonio Choir, Dick 
Hyman. Ahmad Jamal. Antonio Carlos Jobim, 
Clifford Jordán, Sara K., Connie Kay, Kentucky 
Colonels, Lee Konitz, PeggyLee, Chuck Loeb, Joe 
Lovano, Patti Lupone, Gunnar Madsen, Jimmy 
Madison, Taj Mahal, Sean Malone, Manhattan 
String Quartet, Herbie Mann, Michael Manring, 
Marley’s Ghost, Winton Marsalis, Dave McKenna, 


Jackie McLean, Jim McNeely, Milladoiro, 
Mississippi Charles Bevels, Max Morath, Paul 
Motian, New England Conservatory Ragtime 
Ensemble, New York Renaissance Band, Gene 
Parsons, Gram Parsons, Dando Perez, Itzhak 
Perlman, Billy Peterson, Ricky Peterson, Bucky 
Pizzarelli, John Pizzarelli, Chris Potter, Kenny 
Rankin, Mike Renzi, Rincón Ramblers, SamRivers. 
Red Rodney, Rodrigo Romani, Phil Rosenthal, 
Mongo Santamaría, Horace Silver, Lew Soloff, 
George 'Harmónica’ Smith, Janos Starker, Olga 
Tañon, Livingston Taylor, Clark Terry. Thad 
Jones/Mel Lewis Big Band, Steve Turre, Stanley 
Turrentine, McCoyTyner, Jay Ungar, U.S. Coast 
Guard Band, U.S. Marine Band, Amadito Valdez, 
Kenny Washington, Peter Washington, Doc Watson 
and Son, Clarence White, Widespread Jazz 
Orchestra, Robert Pete Williams, Larry Willis, and 
Phil Woods. 

—by Mary Kent (who knows him best) 


Appendix 12 3 o 6 


[ Appendix 13] 

Glossary 


A 

ABSOLUTE LOUDNESS: A term I use when comparingthe apparent 
loudnes8of different sources without movingthe monitor control. 

AES/EBU: The ñame of a digital audio interface jointly conceived by thc 
Audio F.ngineeringSociety and the Kuropean Broadcasting Union. See 
Chapter 20. 

AGE: Automatic Gain Control. Compression that brings up low-level 
passages. See Chapter n. 

AIFF: (along with WAVE. BViT, SD2, MP3): A type of audio file format. 
See Appendix 3. 

ALIASING: An alias is a beat note or difference frequency between the 
audio content and the sample rate, a form of intermodulation 
distortion. Proper filteringshould elimínate aliases, but see Chapters 
16 and 18. Note in an A/D converter, the higher the sample rate, the less 
chance of aliasing producís being created against the normal audio 
conlent, but aliasing distortions could still arise from RF interference. 

ASRC: Asynchronous sample rate converter. A converter from one 
sample rate to another which can work with a wide relationship of input 
to output frequency. and thus can deal with varispeeded rales. Filter 
coefficients are continuously variable, computed on thefly. See Chapter 

A-WEICHTINC: See Weighting. 

C 

COMPACT DISC: A16-bit stereo 5” disc standard jointly developed by 
Sony and Philips in 1980. It can cariy digital audio (Red Book standard) 
or standard Computer files (Yellow Book), and other formats as well. 

COMPRESSION BAJIO: The ratio between input and output Ievel of a 
compressor at the threshold point. See Chapter 10. 

D 

DAT: Digital Audio Tape Recorder. Short for RDAT. which stands for 
Digital Audio tape recorder with rotating heads. There was an SDAT 
(stationary head) standard, but this was never released. 


DAW: Digital Audio Workstation. Usually a Computer with dedicated 
hardware and software for editing and processing digital audio. 

DB: Decibels. A logarithmic measurc of audio level. See chapter 5. 

DBFS: The level meters on digital equipment all read in dBFS, decibels 
below full seale. Full scale is the highest signal which can be recorded. 
Positive going signáis with a valué of 32767 or negativo with a valué of - 
32768 (at 16-bit) are at the máximum. Levels below those are translated 
to decibels. with o dBFS being full scale. For example, -10 dBFS is a 
level 10 dB below full scale. o DBFS means "o dB reference full scale," 
as on a digital meter. Full scale is o dB and tbe meter reads negatively 
below that. 

DITHER: A process that linearizes digital audio by adding a random 
noise signal at the point of the Circuit just before wordlength truncation. 
Dither is absolutely required for clean digital audio recording and 
processing. After dithering. the wordlength can be safely truncated or 
shortened, but truncation without dithering results in quantization 
distortion. See Chapter 4 

DSD (direct stream digital) is the audio format used on the SACD 
(Super Audio Compact Dísd), a rival format to the DVD-A. DSD, as 
opposed to multibit PCM, carries audio information usingone -bit 
encoding. See Chapter 18. 

DVD-As DVD originally stood for Digital Video Disc. but it has now been 
dubbed Digital Versatile Dis: as it can support Computer, audio, and 
video formats. The -A suffix defines the multichannel audio disc 
standard that supports a wide range of PCM sample rates and 
wordlengths, and limited (still) graphics. 

DVD-V: A video and audio disc standard that also supports 
mjltichannel digital audio SRs up to 48 kHz/24-bit. and 2-channel 
digital audio at 96 kHz SR and 192 kHz SR, but there is usually not 
enough room on the disc to fit high-quality video and high resolution 
audio at the same time. When MPEG video takes up much of the space 
or. the disc, usually coded (data-reduced) formats such as DTS or Dolby 
Digital carry the multichannel audio track. 


3 oj 


DYNAMIC RANGE: The range in decibels between the highest level 
which can be encoded and the lowest leve] which can be heard. Sinse 
this is a perceptual. or ear-based determination, it is an approximate 
number. In a properly dithered system, available dynamic range can be 
greater than it» racasurcd »ignal-to-noi»c ratio. Sce Chaptcrs 4 and 5. 

E 

EDL: Edit decisión list. Also known as Playlist. Instead of cutting the 
actual audio. an EDL is a list of instructions of where and how to cut and 
reproduce the audio when played back. Thus, many different versions 
or playbacks of the same audio can be reproduced from the audio files. 
An EDL is to audio as a Word Processor is to words. 

E-E: Pronounced "E to E.” Electronics to electronics. For example, 
when a tape recorder is put into record, its output monitors its input 
directly. This mode is known as E-E. 

EMPHAS 1 S: In an effort to improve the already-excellent signal-ta- 
noise ratio of the Compact disc. GDs (as well as digital tapes) can be 
recorded with emphasis. If it is decided to use emphasis. the recording 
is made with a caiibrated high frequeney fcoost (called Emphasis), and 
duringplayback.a corresponding high frequeney rolloff (called 
Deemphasis) is applied. Thus, in theoiy. signal-to-noise is improved, 
though in practice the loss of high frequeney headroom may reduce any 
audible improvement. Most CDs made today do not use emphasis. 

F 

FIRVS. 1 IR: FIRstands for finite impulse response and IIR for infinite 
impulse rcaponac. Thc8c are typea of fiitera which can be implcmcntcd 
in equalizers. Allanalogequalizers behave like IIR filters, in that there 
are no unnatural delays, just phase shift when the equalization is 
changed. In contrast, an FIR equalizer can only be implemented in 
digital circuitiy, and has only been implemented in a few' user-operated 
designs because of its cost. An FIR equalizer can be made linear phase. 
that is, with no change in time delay as the equalization is raised or 
lowered. But thisis done at the price of yieldinga pre-echo, a time delay 
before the sound occurs. Something which cannot occur in nature and 
so perhaps the ear may never get used to this if the echo is spaced far 
enough away from the original signal. See Appendix 10. 

FIREWdRE: The ñame of a high-speed bi-directional serial interface 
originally developed by Apple Computer, but then officially adopted as 
standard IEEE 1894, for use with digital audio, video, hard drives, 
controllers. etc. See Appendix 7. 

FIXED-POINTVS. FLOAT 1 NG POINT: Fixed-point is the language of 
the AES/EBU interface, so all devices must speak Fixed-Point on their 
inputs and outputs. Thus. if a processor uses floating point, it must 
convert to and from Fixed. The Motorola-based DSP processors use 


Fixed-point math, and Texas Instruments and AT&T processors use 
Floating-point. Fixed-point arithmetic can only represent a dynamic 
range equal to the wordlength, e.g. 24-bit fixed point can only represent 
144 dB of range and 48-bit (double precisión) yields 288 dB. But 
Floating point processors can represent thousands of dB. The downside 
of floating point is that the noise floor changes with the precisión, 
which can cause noise modulation. 

All other things being equal. 32 -bit floaúngpoint is roughly equivalent 
in absolute signal-to-noise ratio to 24-bit fixed, but in general, 32 -bit 
float outperforms 24 bit fixed. This is because 24-bit fixed only has 24- 
bits of precisión w r hen the absolute level of the signal is odBFS. As the 
level of the signal decreases the precisión decreases. For each 6dB. you 
lose one bit. In contrast. 32 -bit float provides 24-bits of precisión 
independently of the absolute level of the signal (until the level is 
extremely low or high. 

Assumingequally skilled designers. 48-bit fixed point is probably 
better (cleaner) than 32 -bit float, but 40-bit float and 64-bit float 
trump them all! 

FRAMF.S: There are two enmmonly used "frame” standards in CD 
work.with different lengths: 75 CD Franics in a second, as opposed to 3 o 
SMPTE framesper second. Modern PQ lists are usually expressed in CD 
Frames, but the older i 63 o systems used SMPTE frames. which have 
less timing resolution. 

G 

CAIN. LOLÍDNESS, VOLUME AND LEVEL: Distinctive terms each 
with their owm meaning, carefully distinguished in Chapter 14. 

GLASS MASTER: Class Mastering is the process of transferring the CD 
master (either on PGM - i 63 o tape, recordable CD, or Exabyte tape) to a 
physical image of the pits on a coated glass substrate. See Chapter 1. 

J 

ISRC: International Standard Recording Code. defined by the RL\Aasa 
unique code for each track on the CD. See Chapter 20. 

JITTER: Timing variations in the digital audio dock, producing 
distortions. See Chapter 19. 

K 

KHZ: Abbreviation for kiloHertz. meaning audio frequeney in 
thousands of eyeles per second. Commonly this usage also applies to 
sampling frequeney. To avoid confusión, in this book, we sometimes 
add the letters SRto help distinguish sample rate, for example, 44.1 kHz 
SR from audio frequeney. for example 5 kHz. 


AppendLx i3-Glos8aiy 3o8 


K-SYSTEM: An integrated system of metering and nonitoringdevised 
by the author (Ch.15). 

K-STEREO™, K-SURROIND™: Patent-Pending(still as of2002) 
prccesses for extracting and enhancing the already existing ambience of 
recordings. See Chapter i 3 . 

M 

MLP: Meridian Lossless Packing. a data-reduction technique which 
made it possible to fit as many as 6 high quality channels of digital audio 
at 96 kHz SR on a DVD - A disc. 

N 

NORMALIZATION: An automatic process available in most DAWs. 
whereby the gain of all program material is adjusted so the peak level 
will just arrive at o dBFS. There are many esthetic and technical reasons 
to avoid normalization. See chapter 5. 

P 

PLUG-IN: An extra process which can be inserted into a DAW. Some 
plug-ins utilize the power of an extemal DSP card. while others. called 
Native Plug-Ins. utilize the computer’s CPU. 

PQ CODING: The Compact disc contains a number of subcode areas, 
each area is named with a le:ter. from P to W, with information on track 
number, timing, and so on. See Chapter 1. 

R 

RED BOOK defines the standards for the audio CD as defined by Sony 
and Philips. No ordinaiy individual has a copy of the Red Book. The real 
Red book can only be found at authorized Compact Disc replication 
plants. The Bluc book defines enhanced CDs with audio and ROM 
material. Yellow Book defines CD ROMs. Creen Book defines compact 
disc Interactive. White book defines the Video CD. Grange Book CD-R 
or Recordable CDs. 

RMS: Root-Mean-Square. A method of averaging levels which 
computes the equivalent power of the material. For all naturally- 
occurring music, an RMS-responding meter will read several dBbelow 
the actual peak level of the music at any moment in time. 

S 

SACD: See DSD. 

SDIF-2: Sony Digital Interface-2. The stereo versión uses 3 cables, one 
for each channel and one for wordclock, thus avoiding the interaction 
between dock and data that causes interface jitter in the competing 
AES/EBU orS/PDIF interfaces. 


SEGUE: A crossfadc between two different types of music. pronounced 
seg-way, from the Italian seguiré meaningto follow. 

SN R: The abbreviation we use in this book for Signal to Noise Ratio. 
SNR of a digital system isthe decibel ratio between the highest level 
which can be encoded (o dBFS) and the dither noise. Since the noise 
can be measured with different weightings, SNR is simply a number we 
can use to compare, but may have little relationship to the actual range 
the ear hears. Dynamic range represents more closely what the ear 
hears, but it's difficult to define precisely the absolute lowest levels we 
can hear in any particular digital system. See Chapters 4 and 5. 

S/PDIF: Shorthand for Sony-Philips Digital Interface. Standard IEC- 
958 and IEC-60958 defines this interface, usually found on an RCA 
(coaxial) connector. See Chapter 20. 

SR: The abbreviation we use in this book for Sample Rate, aka 
Sampling Frequency. 

SRC: (also abbreviated SFC) Sample rate convertir, or Sample 
Frequency Converter. See Chapters 18/19. A Synchronous SRC uses 
fixed filter coefficients, can only conven between certain fixed rates. 
c.g. 44.1, 48. 88.2 and 96 kHz, and cannot accept varispeeded sources. 

STATE MACHINE: Asíate machine is defined as any type of processor 
which produces identicaloutput for the same input data, and which 
does not look at data timing or speed, but only at the State or recent 
history of the data. Most digital processors are State machines and thus 
are completely immune to jitter. (See Chapter 19). 

T 

TRUNCATION: Reduction of wordlength by cutting off the lowcr bits. If 
dithering was not performed first. then simple wordlength truncation 
causes distortion. 

W 

WEIGHTING: When measuring noise, weightingapplies a non-flat 
frequency response curve in an attempt to correlate better to what the 
ear hears. A-weighting is one of the most primitivo curves, based on a 
simple model that the ear hears low frequencies and high frequencies 
less than mid frequencies. Other types of weighting inelude CCIR or IEC. 
also outdated by the latest psychoacoustic research. The most accurate 
curve is called F-Weighting, but even so, applying a single weighted 
number to the measured noise floor of an amplifier is still deceiving. A 
single number has little relationship to the more complex way in which 
the ear really works. Ultimately. the impact of noise should be 
interpreted based on individual time and level analysis of each critical 
band of the ear. 


3 o<) Appendix i3-Glo88aiy 



A 

A/D with built-in compressor 65 
A/D/A converter, ¡n block día. 37 
Absolute loudness, defined 189 
AbsoluteTime Reference (SMPTE) 258 
Accomodation by the ear/brain 108 
Adams, Bob 222, 234 
ADAT ¡nterfoce, limits to 20 bits? 59 
AES-17, peak and average reference point 190 
AES-31 282 

AES/EBU and SPDIF connections, debugging 247-252 
AGC, defined 116 

Analog recurding, why itcan suund lauder 65 
Analog synchronizers 245 
Analog tape simulation 255 
Analog versus Digital Processing 201-208 
Aphex 153 

Apogee UV22 (dither) 56 
ATRAC 282 
Attacktime 119 

Distortion with fast attack/release 120 
Audiocube 210 
Audio Toolbox 157 

Average vs. Peak level measurement 167 

B 

Backdrop 141 
Backups/Archives 33 
Baker, Cíete 288 

Bandwidth limiting, aud ble effects of 44 
Barbabatch 280 
Bariska, Andor 6 
BBC PPM, attack time 122 
BBE 153 
Benchmark 73 
Bertini, Charlie 4 
Bethel, Tom 264 
Bevelle, Mike 138 
BitTransparency 208 
Bits, how many is enough? 200-201 
Bitscope, photo of oscilloscope 38 
software-based (color image) 178 
BLER 32 


Index 


Block Diagram, mastering, and Wire Numbers 38 
simplified 35-38 
Bonger—A ListeningTest 208 
Buchalter, BJ 5 
Buchalter, Stu 5 
Burroughs, Lou 220 
Burtt, Ben 34 

C 

Cable lengths, AES, S/PDIF 249 
Capacity ofvarious media, toble 295 

Carnegie Hall Chart of musical instruments, frequencies, and more: 

Inside front cover 

CD Text, explained 254 

CDR, reliability as master? 23 

Cedar140 

Clipping, removing? 64 
Clover system 32 
Collins, Mike 5, 74 

Comb Filtering, audible effects of 44-46 

Compact Disc, steps from conception to manufacture 17 

Compilation CDs 253 

Compression and CarCD players 265 

Concept álbum 87 

Cranesong HEDD 21,153,204-206 

Cranesong STC-8 152 

CRC 31 

Crookwood 40 

Crossfade, to change levels 115 

D 

D'Antonio, Dr. Peter 82 
DAE-3000 23 
DARs Sync 247 

DAT interfacing, debugging 247 
DAT, suitable as master? 23 
DAW, picking the right one 23-24 
Audiocube 24 
Pyramix 24 
SADiE 24 
Sequoia 24 
Sonic Solutions 23 
Wavelab 24 


3 n 


dBFS, dBm, dBu, defined 167 
DBX Quantum II .53 
OC Offset Removal 148 
DDP 23 
Decibels 

Conversión to Flux 292 
Conversión to i/oltage 293 
Decibels, as a ratio, always 167 
Delay Mixing 219 

Depth and Dimensión, howto achieve 211-220 
Balancing theOrchestra 215 
Imperiiments tn nrhipving gnnd depth ??0 
Monitor accuracy and 219 
Using early reflections 214 
Using Frequency Response 215 
Digital Domain studio (color image) 184 
Digital Monitor Controls 257 
Digital Performer 29 
Directivity Of Musical Instruments 216 
Disc-At-Once 93 
Dither and wordlength 

Auto dither and auto black 59 

Cumulative dithering degrades sound 57 

Explained 49-60 

ForA/D Converter 50 

Low level test of dither effectiveness 57 

Noise shaping and 56 

POW-R 55-56 

Practical examples, how to use dither 58 
Redither(ing) 52 
Self-dither? 51 

Word lengths expand with D$P 53 
Dither vs. truncation, measurements 203 
DLT 23 

Dobson, Richard 282 
Dolby 113, 138, 140, 186 
Dorrough 38 
Dorrough Meter 189 

Double Samplingand oversampling 207-208 
In Equalizers 210 
Downsampling 222 
Drawmer 113, 153 
DSD 225-226 

Dubbing (copying), analogto digital andvice versa 71-73 


Index 3 ia 


Dunn, Chris 233 

Dunn, Julián 6,226, 241, 244, 247, 250 
DVD, in Manufacturing 19 
DVD-A 226 
uses MLP 33 

DVD-Rvs. DVD-RAM 256 
Dynamic Range, defined 109 
Dynamics 

Clipping, Soft Clipping and Oversampled Clipping 127 
Compression and your Monitors 125 
Compression techniques (downward) 123-125 
Downward processors 117-123 
defined 112 

Compression, upward (parallel compression) 133-134 
Defined 112 
In Dolby System 138 
In VCA (Solid State Logic) 138 
With Digital Performer 135 
With Pro Tools 135 
WithSADie 135 
With TC System 6000 135 
With Weiss 135 
Decreasing 111-112 

Downward compression and upward expansión, compared 138 
Equal-Loudness Comparisons 123 
Expansión, downward, defined 112 
Expansión, Lpward 136-137 
Defined 112 
The uncompressor 136 
Familiarizingthe earwith 46 
Hypercompression, in mixing and tracking 128-132 
In Musical H story 110 
Increasing 112 

Manual Gain-Riding, the art of 113-116 
Microdynamics vs Macrodynamics 109-110 
Multiband processing 125-127 
Ratios andThresholds 124-125 
Stereo Image, and Depth 127 
TC Electronic System 6000 135 
Weiss DSl-Mk2 135 

E 

E-E, some DAT machines pass 24 bits 60 
Eartraining 41-48 
Passive vs. Active 42 


Editing 93-97 
Adding room tone 95-96 
Adding tails 95 

Fadeouts and Follow Fades 94-95 
Flead and Tail cleanup-the art of 94 
Repairing Bad Edits 96 
Ed ts, recognizing 47 
83 dB SPL, why? 186-187 
83versus 85, howdid ithappen? 296-297 
Electronic delivery, and QC 33 
Emphasis (preemphasis) 258 
Equalization 99-108 

Adding highs (cautions) 103 

As sibilance controller 108 

Bandwidth vs. Q 101-102, 293 (table of conversión) 

Bass boosts (cautions) 106 

Baxandall 102-103 

Dynamic Equalization 108 

EQ yin and yang (onc rangc affccts anothcr) 104 

FIR 107-108 

Fundamental or Harmonio? 105-106 
High-Pass and Low-Pass Filters 103-104 
IIR 107 

InstantA/Bs? 105 
Knowingwhen to leave itflat 106 
Linear-phase Equalizers 107-108 
One channel or both (all)? 104 
Parametric vs Shelving 101 
What is a Good Tonal Balance? 100 
Error concealment 31 
Error testing 31 
Exábyte 23 
Exciters 153 
F 

F-curve 210 
Fairchild 153 
Father 19 

Feathered, Tardón 271 
FFTfor Music, Spectrafoo 199-200 
Jitter does not affect digital FFT 200 


File Formats for audio 279-282 
AIFF 280-282 
ATRAC 282 
MP3 282 

Resource Forks (Mac) and Extenslons (PC) 279-280 
SDII (Sound Designer II) 281-282 
WAVE and BWF 281 
Filters, errors of 222 
Finalizer (TC Electronic) 126, 151 
Flags (channel status bits) 251 
Folddown 37 
Foti, Frank 127,271-272 
*4 dBu, origin of thls number 74 
Framing and Timing Errors 238-239 
Frequency ranges and tfeir ñames, chart 43-44 
The Fugitive, lack of dynamic range 110 
Fully Automated mastering 27 
Fuston, Lynn 62, 128 
G 

Gain Staging 

Analog Signal Chains 68-70 
Digital Signal chains 70-71 
Gain, defined (gain vs. level) 167 
Gerzon, Michael 6, 102,136 
Glass fiber 250 
Glassmaster 18 
Glass mastering at IX speed 32 
Glossary 307-309 
GML 210 

Model 9500 Mastering EQ 154 
Grimm, Eelco 224 
Grundman, Bernie 3, 105 
Srundy, Al 5 
H 

Haas, Helmut 220 

Haaseffect, harnessing 211-215 
Hard disk formatting 257 
Harley, Bob 243 
Hawksford, Malcolm 233 
HDCD (Pacific Microsonics) 56-57 
Headroom, ¡n analog and converters 66-68 
When unbalancing connections 74 


3i3 


Index 


Hearing loss 264 

High Sample rates, why, why not? 221-226 

Advantagesof Remastering 16/44.1 Recordings at Higher Rates 
226 

Holman, Tomlinson 171, 176, 186, 297 

Honor Roll of good-sounding CDs to enulate: See www.digido.com 

House Video 247 

Hulse, Richard 133-134 

Humphrey, Marvin 5 

Hutchinson, Craig "Hutch" 156 

Hypercompression, fatiguing to the ear 265 

I 

Intensity, defired 166 
Interna! or externalsync? 237 
ISRC Codes 256 
ITU 775 recommendation 171 

I 

James, Cric—biography (book editor) 301 
Jensen, Deane 6 
Jensen, Ted 3 

Jitter-Separating the Myths from the Nysteries 227-244 
A/D-Jitter 234 

AES/CBU vs. wordclock sync 233 
AnalogMixing 232-233 
And CO copies? 237-238 
ASRC 232 
Autotune 231 
Clock Accuracy 236-237 
Off center docks 239 
Clock Stability Requirements 233 
D/A-Jitter 234 
Digital Mixirg 232 
Ephemeral jitter 231 
Firewire 233 

Interface jitter vs. samplingjitter 227 

Internal or external sync? 237 

J-Test 241-244 

Jitter measurements 241-242 

Jitter reduction units, bandaid or cure? 235-236 

Jitter, defined 228-229 

On digital load-in? 237 

Redocking Circuit 239-240 

The Internet and Jitter 235 

Weiss DACjitter 243 


Johnson, Chris 74 

Johnston, Jim 4, 57, 73, 82, 108, 166, 210, 225-226 
Johnston, Robert Bristow 224 

K 

K-14 Meter, color image 177 

K-20 Meter (color image) 177 

K-Stereo cnd K-Surround Processors 154 

K-System Meters, ballistics, scale, etc. 296-298 

K-System Proposal, defined 189-195 

Katz, Bob-biography 302-306 

Kent, Mary 3 

Kessler, Ralph 5 

Knee (of compressor) 119 

Konar, Mithat 50 

L 

Lavry Engineering (formerly db Technologies) 56 
LEDR test for monitor accuracy 82 
LEQ 190 

LEQA, and K-System 297 
Level, defined 166 
Leveling The Album 97-98 
The Domino Effect 98 
Levels, measuring and interpreting 61-74 
Over level, measuring and interpreting 61-64 
Practice Safe Levels, in 24-bit recording 64 
Linearity, testingforgood 55 
Lipshitz, Stanley 60 
Locanthi, Bart 60 
Logging: Preparation Logs 21 
Logs and Labels for tapes, files and boxes 289-290 
Loudnessrace 187-188 
Loudness, defined 166 
Loudness, judging 65 

With single D/A Converter 66 
Loudness, judging by monitor position 168 
Ludwig, Bcb 3-4, 30, 128,198, 278 
Lund, Thonas 6, 73 
M 

Manley Massive Passive 155 
Vari-Mu 156 
MaxxBass 156 


Index 3 14 


Maselec Model 2012 158 
Massenburg, George 5, 101, 108, 154 
Mastering 
Defined 11 
Workflow 25-29 

Máximum CD Program Length 285 
McMillan, Mike 288 
Mead, Margaret 269 
Meadows, Glenn 3, 6, 12, 74, 258 
Mediatwist 248 
Meridian (dither) 56 
Metadata, defined 195-196 
Dialnorm 195 
Mixlev 196 
Metivier, Dan 5 
Metric Halo Mobile 1/0 157 
MIDI program changes 27-29 
Millennia Media 156 

NSEp-2 156, 202 (measurements) 

MLP: Meridian Lossless Packing 33 
Monitor (control) Position, defined 167 
Monitor Balance, intercharnel 145 
Monitor Calibration 165-176 

Bass Management and subwoofer adjustment 174 
Calibrating and assessirg the system 170-176 
Different Site Rooms 169-170 
Monitor Equalization? 176 
Phantom CenterCheck ¡73 

Using A Calibrated Monitor System for Level and puality Judgments 
168-169 

Monitor Gain vs. Monitor Level 167 
Monitor Selector, digital 37 
Monitoring, philosophyof 75-82 

Adding high end--what's wrong with that? 80 

Altérnate Monitoring Systems 81 

Avoid time-domain errors 82 

Beauty versus accuracy? 78-79 

Compression of program material, and monitors 80 

High resolution monitor system 75-76 

Monitor Equalization—by ear? 77 

Nearfields and their errors 79 

Real-World Monitorspeakers? 78 

Soffit mounting 82 

Subwoofers and 76-77 


Typical Monitor Speakers? 79 

Why Accurate Monitors Needed: Bell curve theory 77-78 
Moorer, Dr. J. Andrew 141, 226, 231, 243 
Mora, Matthew Xavier 223-224 
Moses., Don 243 
Mother 19 
MP3 282 

MS Mastering techniques (compression, balancing, equalization) 149-151 
Mytek 38 

N 

NARAS, mastertape delivery recommendations 285 
Natural loudness, preferred? 196 
NC noise rating of monitoring room 76 
Nesbitt, Alee 114 

Nichols, Roger 9-10 (Foreword), 288 
Nielsen, Soren 73 
Noise and Distortion, defined 139 
Noise Reduction 139-144 
Algorithmix 142 
Audiocube 142 
Backdrop 141-142 
Cedar 140, 142 

Clipping, reducing the effeets of 143 
Complex Filtering 141 
Expansión 141 
GML 142 

Manual Declicking, Dethumping, De-Distortioning, Depopplng 142-144 
No Noise 140 

Retouch (Cedar, Sadie) 142 
Sonic Solutions 140-141,143 
TC Electronic 142 
Waves 142 

Normalization, the mythsof 65-66 
Nuil test 31,208 
0 

Dbjective versus subjective assessment 197-210 
Dlhsson, Bob 3, 92, 130,149, 202, 223 
3MF 282 

Dptical Cables 250 
Drban, Robert 127,271-272 
Dverload, familiarizing the ear with 47 
Dversampling 222-223, 226 


3 15 Index 


Oversompling peak meter 64 
Owsinsky, Bobby 128 

P 

Pacific Microsonics 56 
Paper Labels, do not use 287 
Parth, Ernst 223 

Patching Orderof Processes 151-152 
PCM-1630 23, 30-32 
PCM-9000 23 

Perception vs. Measurement 197-210 
Phase shifts and Azimuth Error 147-148 
Cedar Azimuth Corrector 148 
Pink noise, uncorrelated 172 
Pitch and Time Correction 148 
Pitch perception 44 
PLL 228 

Oefined 239-240 

Plug-ins vs. Stand-Alone Processors 152 

PMCD, needed? 256 

Polarity problems, recognizing 48 

Polarity, absolute, fixing absolute polarity 147 

Polarity, relative, Fixing Relative Polarity 145-147 

PPM, analog 73 

PQ (track) Coding 

And álbum spacing 91-92 
And Processor Latency 93 
Hldden tracks, within the álbum 92 
Hidden inthe pregap 93 
PQ Offsets 93 

Typical DVD Players not obeying end of track marks 92 
PQ Lists 
Defined 22 

PreparingTapes and Files for Masteing 283-288 
Analog tape Preparation 285-286 
Logs to accompany tapes and files 284 
Preparing CD ROMs/DVD-R files 286-288 
What Sample Rate to use? 286 
Prism 56 (dither), 241 (jítter) 

ProTools 135 

Producer, Mastering Without a Producer Present? 24-25 
Proximity eftect, sound of 47 
Pultec 153 
Pyramix 22 


Index 3i6 


Q 

Q and Bandwidth conversión table 293 
Quality Control 30-33 
Quantizatior 49-60 

R 

Radio Ready: The Truth 271-278 
AGC, in radio broadcasting 275 
Clipping, in radio broadcasting 277-278 
Equalization, in radio broadcasting 276 
Hypercompressed CDs-sound worse on radio 273 
Multiband Compression, in radio broadcasting 276-277 
Phase rotator, in radio broadcasting 274 
Stereo Enhancement, in radio broadcasting 275-276 
Rayburn, Ray 6 

Recommended reading 299-300 
Reflection-free zone 76 
Reid, Gordon 109, 140 
Release delay 120 
Release time 119 

Reverberation Processors, testing sound quality of 157 
RME 243 
Router 35 
Rusby, Jim 254 

s 

S/PDIF voltage levels 249 
S/PDIF, defined 258 
SACO 226 

SADiE 22, 24, 28, 29, 32 

Sample rate, higher sounds better? 61, 221-226 

Sax, Doug 3 

Scott, Rusty 224 

SCSI 24 

Segue 91 

Sequencing, the art: Putting an Album in Order 87-90 
Sequoia 22 
Seva 5 
Shred 64, 74 

Sibilance Control lers 157-158 
Maselec 2312 158 
TC System 6000 158 
Weiss DS1-MK2 158 


Single Precisión, Double Precisión, or Floating Point? 206-207 
Sintefex Convolution Processor 158 
Smith, Noel 6 
SNR 

not improved with normalization 66 
of analog media 61 
with 24-bit recording 65 
with analog chains 69 
«ith digital chains 70 
with digital gain boost 71 
Sonic Solutions 21, 23, 24, 28, 32, 33, 115, 140 
Soulodre, Dr. Gilbert 141 
Source-puality Rule 209-210 
Space and Depth, familiarizing the ear with 46 
Spacing The Album 90-91 
SpectraFoo, FFTspecifications 199-200, 210 
Spectragram of Bass frequencies 177 
Speed (transfer capacity) cf various media, table 294 
QSL 294 
Ethernet 294 
Firewire 294 
SCSI 294 
USB 294 

SPL (brand) Machine Head 153 
SPl, defined 167 
Stamper 19 

Standalone CD Recorders 93 

Steinberg Magneto 153 

Stems (submixes) 149 

Stcrco Balance, chccking 146 

Stereo position indicator 178 (color image), 199 

Stockham, Tom 226 

Story, Mike 226 

Stout, Dan 92 

Strauss, Konrad 4 

Stuart, J. Robert 200 

Studio block diugrum 35-38 

Sutotin, Andre 288 

Super Bit Mapping (SBM) 56 

T 

Tascam DA-45 DAT, pictured 37 
TC Electronic 126 
Finalizer 151 


System 6000 37, 39, 158 
Test CDs 299-300 
3to 1 rule 217 

Timecode and Wordclock 245-246 
Pull-ups and Pull-downs 246 
Titanic, good dynamic range 110 
Tonal balance Preference, bright? 41 
Toslink 250 
Track-At-Once 93 
Tracks, hidden 93 
Travis, Chris 208 
Truncation 53-54 
24-bit vs. 16 64-65 
Two-wire/4-wire 96K and 192K 252 
U 

Unbalanced connections in the Mastering studio 255 
UPC/EAN 256 

V 

Vanderkooy, J. 60 
Vinyl and Cassette 256-257 
Volume, defined 166 
VSP, pictured 37 
VU meter 65 

characteristics 185-186 

M 

hashburn, Steve 6 
hatkinson, John 107-108 
Inavelab 22 
Inaves 150, 155-156 
C4 29 

IDR dither 155 

L2 155, 207 (measurements) 

MaxxBass 156 

kebboard 6, 130, 271, 288, 223 
Heiss 210 

DS1-MK2 125, 158-159, 207 (measurements) 
EQ1-LP 159 
Weiss, Daniel 6, 107 
Windows, with SADiE 24 


3, 7 


Index 


Wordclock, NTSCto wordclock converter 38 
Voltage standards 246 
Wordlengths, expand with DSP 52-53 

l 

l Systems 

ZK-6 K-Surround processor 159 
ZQ-2 159, 203 (measurements) 

Router 35-37 
Zelniker, Glenn 6 

Zwicker 190, 195, 297 (¡n K-System) 


Index 3i8 


Afterword 


How This Book Was Written and Edited 

This book was collaboratively produced by two individuáis (writer and editor) located on opposite sides 
of the globe. Computer technology and the Internet have advanced the non-linear process of writing and 
editing a book—proofreader's marks and symbols have become obsolete. Instead, we have Microsoft to 
thank for providing two little-known fea tures in Word called: Track Changes and Comments. Through 
these features, Eric and I were able to interact, exchange document revisions, annotate and comment the 
text, and view each others' changes. 

I created a system for all the author’s output to be odd-numbered revisions, and the editor to respond 
with even-numbers. Each revisión was in its ora document (we did not use version-tracking. which has 
limitations). So as each chapter progressed, it would be incrementally numbered, and it was easy to see its 
status and who bad produced the last revisión. 

When it carne time for fact-checking, Jim Johnston added his comments and Word correctly identified 
JJ as their origin. I worked on a Macintosh and Eric and Jim ona PC, but fortunately Microsoft Word 
transcends operating systems. 

I think it most appropriate that my interior graphic designers Toni and Thuan have chosen to set this 
book in a typeíace named Filosofía. 


Bob Katz, Orlando 3002 


BEGOME A MASTER OF AUDIO 

This book is for everyone who wants to increase tbeir mastery of digital 
and analog audio: musicians, producers, A&R, mastering, recording and mixing 
engineers. It is suitable for all levels of students and professionals. 

To master audio you must become a master of audio. 

WHAT IS MASTERING? 


Mastering is the last ci'eative step in the process of 
producinga record álbum, compactdisc, DVD-Aor 
SACD. Bob Katz unravels the technical mysteries and 
explains the artistic techniques. Don’t leave íor the 
studio without this book! 

Mastering Audio discusses audio philosophy and art: 
sequencing, leveling, processing; howto make a record 


álbum radio-ready-, mixing as it relates to mastering. 
Plus, leading-edge audio concepts in an easy-to-grasp, 
holistic manner, including an ear-opening investigation 
of the mysteries of jitter, dither and wordlengths, high 
sample rates, disto ilion, headroom. monitor calibration, 
metering, depth perception, compression and 
expansión, equipment interconneclion and much more. 


f f Bob Katz ’s well thought out book on mastering 
is ir.deed a welcotne addttion to anyonc ihat works 
in theproduction ofmusic and who wants to under- 
stand all aspects of this so called mysterious stage of 
music production... 

There is enough Information here for filling in 
gaps that eren seasoned mastering engineers might 
have. 11 

BERNIEGRCNDMAN. BERME CRUNOMAN MASTERING. 

HOLLYWOOO. CA 

f f fio b Katz t,s a truejedi Knight of Audio 11 

A.T. MICHAEL MACOUNALt). MASTERING ENG1NEER. ALGORHTTHMS. 

NEW YORK CITY 


© 

Pq£ gj Focal Press 

_ An Imprint of Elsevier Science 

PrCSS www.focalpress.com 


ff An excellent reference for anyone interested in 
Cü Mastering. I don ’t know of another single source 
with as much detailed Information on the master¬ 
ing process. Even industry veterans are guaranteed 
to pick up something they hadnt known or mere 
unsureof. 11 

TRO /ENSEN. CHiEF ENCÍNEER. STERMNC SOIJNÜ. 

NEW YORK CITY 

*'The first piece of equipmen t [you] should bu y 
is Bob Katz’s Mastering Audio: The Art and the 
Science. 71 

ROCER NICHOUSIFROM THE FOREU’ORD) 
ROGER NICHOIS MASTERING. 

MIAMI 


f f Bob is a master of the technology changeover 
frotn analog to digital. His book covers ateas that 
none otherhave touched. 7 7 

GEOaCE MASSENBURC PRODCCER’ENGINEER. 

NASHVriiE. IN. 

r f Master This Book! 11 

CLENN MEAHOWS. MASTERING ENCINEER. NASHVHXE 

I first picked up this book. I couldn't put it 
downuntil I liad read n all! This book should berequired 
readingfor all audio professionals - and not just in mas¬ 
tering. Eren studio ouner and engineer needs (o knou- 
about this stuff. 11 

MHCECOELINS. 

Antlmr of PRO TOOLS FOR MliSIC PRODFCTION. ION DON 

ISBN 0-240-60545-3 




